0.1 Backports for 0.1.2 #3613

TheBlueMatt · 2025-02-21T22:49:02Z

I think this is all the things that were pending a backport to 0.1.

Instead of using elaborate calculations to determine the exact amount of bytes need for a BOLT12 message are allocated, use a fixed size amount. This reduces the code complexity and potentially reduces heap fragmentation in the normal case.

Now that the previous commit removed assertions on Vec capacities for BOLT12 messages, the use of reserve_exact in tests is no longer needed.

The formula for applying half lives was incorrect. Test coverage added. Relatively straightforward merge conflicts (code added in 311a083 which was not included neighbored new code added) fixed in: * lightning/src/routing/scoring.rs

`counterparty_spendable_height` is not used outside of `package.rs` so there's not much reason to have an accessor for it. Also, in the next commit an issue with setting the correct value for revoked counterparty HTLC outputs is fixed, and the upgrade path causes the value to be 0 in some cases, making using the value in too many places somewhat fraught.

If the counterparty broadcasts a revoked transaction with offered HTLCs, the output is not immediately pinnable as the counterparty cannot claim the HTLC until the CLTV expires and they use an HTLC-Timeout path. Here we fix the `counterparty_spendable_height` value we set on counterparty revoked HTLC claims to match reality. Note that because we still consider these outputs `Pinnable` the value is not used. In the next commit we'll start making them `Unpinnable` which will actually change behavior. Note that when upgrading we have to wipe the `counterparty_spendable_height` value for non-offered HTLCs as otherwise we'd consider them `Unpinnable` when they are, in fact, `Pinnable`.

If the counterparty broadcasts a revoked transaction with offered HTLCs, the output is not immediately pinnable as the counterparty cannot claim the HTLC until the CLTV expires and they use an HTLC-Timeout path. Here we properly set these packages as `Unpinnable`, changing some transaction generation during tests.

If one message handler refuses a connection by returning an `Err` from `peer_connected`, other handlers which already got the `peer_connected` will not see the corresponding `peer_disconnected`, leaving them in a potentially-inconsistent state. Here we ensure we call the `peer_disconnected` handler for all handlers which received a `peer_connected` event (except the one which refused the connection).

`PublicKey` parsing is relatively expensive as we have to check if the point is actually on the curve. To avoid it, our `NetworkGraph` uses `NodeId`s which don't have the validity requirement. Sadly, we were always parsing the broadcasting node's `PublicKey` from the `node_id` in the network graph whenever we see an update for that channel, whether we have a corresponding signature or not. Here we fix this, only parsing the public key (and hashing the message) if we're going to check a signature.

When we build a new `NetworkGraph` from empty, we're generally doing an initial startup and will be syncing the graph very soon. Using an initially-empty `IndexedMap` for the `channels` and `nodes` results in quite some memory churn, with the initial RGS application benchmark showing 15% of its time in pagefault handling alone (i.e. allocating new memory from the OS, let alone the 23% of time in `memmove`). Further, when deserializing a `NetworkGraph`, we'd swapped the expected node and channel count constants, leaving the node map too small and causing map doubling as we read entries from disk. Finally, when deserializing, allocating only exactly the amount of map entries we need is likely to lead to at least one doubling, so we're better off just over-estimating the number of nodes and channels and allocating what we want. Here we just always allocate `channels` and `nodes` based on constants, leading to a 20%-ish speedup in the initial RGS application benchmark.

If we have a first-hop channel from a first-hop hint, we'll ignore the fees on it as we won't charge ourselves fees. However, if we have a first-hop channel from the network graph, we should do the same. We do so here, also teeing up a coming commit which will remove much of the custom codepath for first-hop hints and start using this common codepath as well.

These tests are a bit annoying to deal with and ultimately work on almost the same graph subset, so it makes sense to combine their graph layout logic and then call it twice. We do that here, combining them and also cleaning up the possible paths as there actually are paths that the router could select which don't meet the tests requirements.

In a coming commit we'll start calling `add_entries_to_cheapest_to_target_node` without always having a public-graph node entry in order to process last- and first-hops via a common codepath. In order to do so, we always need the `node_counter` for the node, however, and thus we track them in `RouteGraphNode` and pass them through to `add_entries_to_cheapest_to_target_node` here. We also take this opportunity to swap the node preference logic to look at the counters, which is slightly less computational work, though it does require some unrelated test changes.

This likely only impacts very rare edge cases, but if we have two equal-cost paths, we should likely prefer ones which contribute more value (avoiding cases where we use paths which are amount-limited but equal fee to higher-amount paths) and then paths with fewer hops (which may complete faster). It does make test behavior more robust against router changes, which comes in handy over the coming commits.

When we handle the unblinded last-hop route hints from an invoice, we had a good bit of code dedicated to handling fee propagation through the (potentially) multiple last-hops and connecting them to potentially directly-connected first-hops. This was a good bit of code that was almost never used, and it turns out was also buggy - we could process a route hint with multiple hops, committing to one path through nodes A, B, to C, then process another route hint (or public channel) which changes our best path from B to C, making the A entry invalid. Here we remove the whole maze, utilizing the normal hop-processing logic in `add_entries_to_cheapest_to_target_node` for last-hops as well. It requires tracking which nodes connect to last-hop hints similar to the way we do with `is_first_hop_target` in `PathBuildingHop`, storing the `CandidateRouteHop`s in a new map, and always calling `add_entries_to_cheapest_to_target_node` on the payee node, whether its public or not.

When we do pathfinding with blinded paths, we start each pathfinding iteration by inserting all the blinded paths into our nodes map as last-hops to the destination. As we do that, we check if any of the introduction points happen to be nodes we have direct chanels with, as we want to use the local info for such channels and support finding a path even if that channel is not publicly announced. However, as we iterate the blinded paths, we may find a second blinded path from the same introduction point which we prefer over the first. If this happens, we would already have added info from us over the local channel to that intro point and end up with calculations for the first hop to a blinded path that we no longer prefer. This is ultimately fixed here in two ways: (a) we process the first-hop channels to blinded path introduction points in a separate loop after we've processed all blinded paths, ensuring we only ever consider a channel to the blinded path we will ultimately prefer. (b) In the next commit, we add we add a new tracking bool in `PathBuildingHop` called `best_path_from_hop_selected` which we set when we process a channel backwards from a node, indicating that we've committed to the best path to the node and check when we add a new path to a node. This would have resulted in a much earlier debug-assertion in fuzzing or several tests.

When we process a path backwards from a node during pathfinding, we implicitly commit to the path up to that node. Any changes to the preferred path up to that node will make the newly processed path's state invalid. In the previous few commits we fixed cases for this in last-hop paths (both blinded and unblinded). Here we add assertions to enforce this, tracked in a new bool in `PathBuildingHop`.

The router is a somewhat complicated beast, and though the last few commits removed some code from it, a complicated beast it remains. Thus, having `expect`s in it is somewhat risky, so we take this opportunity to replace some of them with `debug_assert!(false)`s and an `Err`-return.

When we see a channel come into the router as a route-hint, but its for a direct channel of ours, we'd like to ignore the route-hint as we have more information in the first-hop channel info. We do this by matching SCIDs, but only considered outbound SCID aliases. Here we change to consider both outbound SCID aliases and the full channel SCID, which some nodes may use in their invoices.

We recently introduced release branches that need to remain backwards compatible. However, even small changes to item visibility during backporting fixes might introduce SemVer violations (see https://doc.rust-lang.org/cargo/reference/semver.html#change-categories for a list of changs that would be considered major/minor). To make sure we don't accidentally introduce such changes in the backports, we here add a new `semver-checks` CI job that utilizes `cargo-semver-checks` (https://github.com/obi1kenobi/cargo-semver-checks), and have it run on any push or pull requests towards anything else but `main`/`master` (i.e., all feature branches to come).

In adb0afc we started raising bucket weights to the power four in the historical model. This improved our model's accuracy greatly, but resulted in a much larger `total_valid_points_tracked`. In the same commit we converted `total_valid_points_tracked` to a float, but retained the 64-bit integer math to build it out of integer bucket values. Sadly, 64 bits are not enough to sum 1024 bucket pairs of 16-bit integers multiplied together and then squared (we need 16*4 + 10 = 74 bits to avoid overflow). Thus, here we replace the summation with 128-bit integers. Straightforward merge conflict (code added in 311a083 which was not included neighbored new code added) fixed in: * lightning/src/routing/scoring.rs

tnull · 2025-03-03T08:07:07Z

Seems CI is pretty unhappy

error[E0433]: failed to resolve: use of undeclared type `HistoricalLiquidityTracker`
    --> lightning/src/routing/scoring.rs:2088:22
     |
2088 |             let mut tracker = HistoricalLiquidityTracker::new();
     |                               ^^^^^^^^^^^^^^^^^^^^^^^^^^ use of undeclared type `HistoricalLiquidityTracker`
     |
help: consider importing this struct through its public re-export
     |
2060 +         use crate::routing::scoring::HistoricalLiquidityTracker;
     |

error[E0433]: failed to resolve: use of undeclared type `ProbabilisticScoringFeeParameters`
    --> lightning/src/routing/scoring.rs:2097:25
     |
2097 |             let default_params = ProbabilisticScoringFeeParameters::default();
     |                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ use of undeclared type `ProbabilisticScoringFeeParameters`
     |
help: consider importing this struct
     |
2060 +         use crate::routing::scoring::ProbabilisticScoringFeeParameters;
     |

warning: elided lifetime has a name
   --> lightning/src/sync/debug_sync.rs:348:65
    |
341 | impl<'a, T: 'a> LockTestExt<'a> for Mutex<T> {
    |      -- lifetime `'a` declared here
...
348 |     fn unsafe_well_ordered_double_lock_self(&'a self) -> MutexGuard<T> {
    |                                                                    ^ this elided lifetime gets resolved as `'a`
    |
    = note: `#[warn(elided_named_lifetimes)]` on by default

For more information about this error, try `rustc --explain E0433`.
warning: `lightning` (lib test) generated 1 warning
error: could not compile `lightning` (lib test) due to 2 previous errors; 1 warning emitted

jkczyz and others added 18 commits February 21, 2025 22:33

Drop use of reserve_exact in BOLT12 tests

f80d82e

Now that the previous commit removed assertions on Vec capacities for BOLT12 messages, the use of reserve_exact in tests is no longer needed.

fix historical liquidity bucket decay

24731d4

The formula for applying half lives was incorrect. Test coverage added. Relatively straightforward merge conflicts (code added in 311a083 which was not included neighbored new code added) fixed in: * lightning/src/routing/scoring.rs

TheBlueMatt added this to the 0.1.2 milestone Feb 21, 2025

TheBlueMatt mentioned this pull request Feb 21, 2025

Add SemVer compatibility checks to CI #3560

Merged

TheBlueMatt added the weekly goal Someone wants to land this this week label Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.1 Backports for 0.1.2 #3613

0.1 Backports for 0.1.2 #3613

TheBlueMatt commented Feb 21, 2025

tnull commented Mar 3, 2025

0.1 Backports for 0.1.2 #3613

Are you sure you want to change the base?

0.1 Backports for 0.1.2 #3613

Conversation

TheBlueMatt commented Feb 21, 2025

tnull commented Mar 3, 2025