Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.1 Backports for 0.1.2 #3613

Open
wants to merge 20 commits into
base: 0.1
Choose a base branch
from

Conversation

TheBlueMatt
Copy link
Collaborator

I think this is all the things that were pending a backport to 0.1.

jkczyz and others added 18 commits February 21, 2025 22:33
Instead of using elaborate calculations to determine the exact amount of
bytes need for a BOLT12 message are allocated, use a fixed size amount.
This reduces the code complexity and potentially reduces heap
fragmentation in the normal case.
Now that the previous commit removed assertions on Vec capacities for
BOLT12 messages, the use of reserve_exact in tests is no longer needed.
The formula for applying half lives was incorrect. Test coverage added.

Relatively straightforward merge conflicts (code added in
311a083 which was not included
neighbored new code added) fixed in:
 * lightning/src/routing/scoring.rs
`counterparty_spendable_height` is not used outside of `package.rs`
so there's not much reason to have an accessor for it. Also, in the
next commit an issue with setting the correct value for revoked
counterparty HTLC outputs is fixed, and the upgrade path causes the
value to be 0 in some cases, making using the value in too many
places somewhat fraught.
If the counterparty broadcasts a revoked transaction with offered
HTLCs, the output is not immediately pinnable as the counterparty
cannot claim the HTLC until the CLTV expires and they use an
HTLC-Timeout path.

Here we fix the `counterparty_spendable_height` value we set on
counterparty revoked HTLC claims to match reality. Note that
because we still consider these outputs `Pinnable` the value is
not used. In the next commit we'll start making them `Unpinnable`
which will actually change behavior.

Note that when upgrading we have to wipe the
`counterparty_spendable_height` value for non-offered HTLCs as
otherwise we'd consider them `Unpinnable` when they are, in fact,
`Pinnable`.
If the counterparty broadcasts a revoked transaction with offered
HTLCs, the output is not immediately pinnable as the counterparty
cannot claim the HTLC until the CLTV expires and they use an
HTLC-Timeout path.

Here we properly set these packages as `Unpinnable`, changing some
transaction generation during tests.
If one message handler refuses a connection by returning an `Err`
from `peer_connected`, other handlers which already got the
`peer_connected` will not see the corresponding
`peer_disconnected`, leaving them in a potentially-inconsistent
state.

Here we ensure we call the `peer_disconnected` handler for all
handlers which received a `peer_connected` event (except the one
which refused the connection).
`PublicKey` parsing is relatively expensive as we have to check if
the point is actually on the curve. To avoid it, our `NetworkGraph`
uses `NodeId`s which don't have the validity requirement.

Sadly, we were always parsing the broadcasting node's `PublicKey`
from the `node_id` in the network graph whenever we see an update
for that channel, whether we have a corresponding signature or not.

Here we fix this, only parsing the public key (and hashing the
message) if we're going to check a signature.
When we build a new `NetworkGraph` from empty, we're generally
doing an initial startup and will be syncing the graph very soon.
Using an initially-empty `IndexedMap` for the `channels` and
`nodes` results in quite some memory churn, with the initial RGS
application benchmark showing 15% of its time in pagefault handling
alone (i.e. allocating new memory from the OS, let alone the 23%
of time in `memmove`).

Further, when deserializing a `NetworkGraph`, we'd swapped the
expected node and channel count constants, leaving the node map
too small and causing map doubling as we read entries from disk.

Finally, when deserializing, allocating only exactly the amount of
map entries we need is likely to lead to at least one doubling, so
we're better off just over-estimating the number of nodes and
channels and allocating what we want.

Here we just always allocate `channels` and `nodes` based on
constants, leading to a 20%-ish speedup in the initial RGS
application benchmark.
If we have a first-hop channel from a first-hop hint, we'll ignore
the fees on it as we won't charge ourselves fees. However, if we
have a first-hop channel from the network graph, we should do the
same.

We do so here, also teeing up a coming commit which will remove
much of the custom codepath for first-hop hints and start using
this common codepath as well.
These tests are a bit annoying to deal with and ultimately work on
almost the same graph subset, so it makes sense to combine their
graph layout logic and then call it twice.

We do that here, combining them and also cleaning up the possible
paths as there actually are paths that the router could select
which don't meet the tests requirements.
In a coming commit we'll start calling
`add_entries_to_cheapest_to_target_node` without always having a
public-graph node entry in order to process last- and first-hops
via a common codepath. In order to do so, we always need the
`node_counter` for the node, however, and thus we track them in
`RouteGraphNode` and pass them through to
`add_entries_to_cheapest_to_target_node` here.

We also take this opportunity to swap the node preference logic to
look at the counters, which is slightly less computational work,
though it does require some unrelated test changes.
This likely only impacts very rare edge cases, but if we have two
equal-cost paths, we should likely prefer ones which contribute
more value (avoiding cases where we use paths which are
amount-limited but equal fee to higher-amount paths) and then paths
with fewer hops (which may complete faster).

It does make test behavior more robust against router changes,
which comes in handy over the coming commits.
When we handle the unblinded last-hop route hints from an invoice,
we had a good bit of code dedicated to handling fee propagation
through the (potentially) multiple last-hops and connecting them to
potentially directly-connected first-hops.

This was a good bit of code that was almost never used, and it
turns out was also buggy - we could process a route hint with
multiple hops, committing to one path through nodes A, B, to C,
then process another route hint (or public channel) which changes
our best path from B to C, making the A entry invalid.

Here we remove the whole maze, utilizing the normal hop-processing
logic in `add_entries_to_cheapest_to_target_node` for last-hops as
well. It requires tracking which nodes connect to last-hop hints
similar to the way we do with `is_first_hop_target` in
`PathBuildingHop`, storing the `CandidateRouteHop`s in a new map,
and always calling `add_entries_to_cheapest_to_target_node` on the
payee node, whether its public or not.
When we do pathfinding with blinded paths, we start each
pathfinding iteration by inserting all the blinded paths into our
nodes map as last-hops to the destination. As we do that, we check
if any of the introduction points happen to be nodes we have direct
chanels with, as we want to use the local info for such channels
and support finding a path even if that channel is not publicly
announced.

However, as we iterate the blinded paths, we may find a second
blinded path from the same introduction point which we prefer over
the first. If this happens, we would already have added info from
us over the local channel to that intro point and end up with
calculations for the first hop to a blinded path that we no longer
prefer.

This is ultimately fixed here in two ways:
(a) we process the first-hop channels to blinded path introduction
    points in a separate loop after we've processed all blinded
    paths, ensuring we only ever consider a channel to the blinded
    path we will ultimately prefer.
(b) In the next commit, we add we add a new tracking bool in
    `PathBuildingHop` called `best_path_from_hop_selected` which we
    set when we process a channel backwards from a node, indicating
    that we've committed to the best path to the node and check when
    we add a new path to a node. This would have resulted in a much
    earlier debug-assertion in fuzzing or several tests.
When we process a path backwards from a node during pathfinding, we
implicitly commit to the path up to that node. Any changes to the
preferred path up to that node will make the newly processed path's
state invalid.

In the previous few commits we fixed cases for this in last-hop
paths (both blinded and unblinded).

Here we add assertions to enforce this, tracked in a new bool in
`PathBuildingHop`.
The router is a somewhat complicated beast, and though the last few
commits removed some code from it, a complicated beast it remains.
Thus, having `expect`s in it is somewhat risky, so we take this
opportunity to replace some of them with `debug_assert!(false)`s
and an `Err`-return.
When we see a channel come into the router as a route-hint, but its
for a direct channel of ours, we'd like to ignore the route-hint as
we have more information in the first-hop channel info. We do this
by matching SCIDs, but only considered outbound SCID aliases.

Here we change to consider both outbound SCID aliases and the full
channel SCID, which some nodes may use in their invoices.
We recently introduced release branches that need to remain backwards
compatible. However, even small changes to item visibility during
backporting fixes might introduce SemVer violations (see
https://doc.rust-lang.org/cargo/reference/semver.html#change-categories
for a list of changs that would be considered major/minor).

To make sure we don't accidentally introduce such changes in the
backports, we here add a new `semver-checks` CI job that utilizes
`cargo-semver-checks`
(https://github.com/obi1kenobi/cargo-semver-checks), and have it run on
any push or pull requests towards anything else but `main`/`master`
(i.e., all feature branches to come).
In adb0afc we started raising
bucket weights to the power four in the historical model. This
improved our model's accuracy greatly, but resulted in a much
larger `total_valid_points_tracked`. In the same commit we
converted `total_valid_points_tracked` to a float, but retained the
64-bit integer math to build it out of integer bucket values.

Sadly, 64 bits are not enough to sum 1024 bucket pairs of 16-bit
integers multiplied together and then squared (we need 16*4 + 10 =
74 bits to avoid overflow). Thus, here we replace the summation
with 128-bit integers.

Straightforward merge conflict (code added in
311a083 which was not included
neighbored new code added) fixed in:
 * lightning/src/routing/scoring.rs
@TheBlueMatt TheBlueMatt added the weekly goal Someone wants to land this this week label Feb 27, 2025
@tnull
Copy link
Contributor

tnull commented Mar 3, 2025

Seems CI is pretty unhappy

error[E0433]: failed to resolve: use of undeclared type `HistoricalLiquidityTracker`
    --> lightning/src/routing/scoring.rs:2088:22
     |
2088 |             let mut tracker = HistoricalLiquidityTracker::new();
     |                               ^^^^^^^^^^^^^^^^^^^^^^^^^^ use of undeclared type `HistoricalLiquidityTracker`
     |
help: consider importing this struct through its public re-export
     |
2060 +         use crate::routing::scoring::HistoricalLiquidityTracker;
     |

error[E0433]: failed to resolve: use of undeclared type `ProbabilisticScoringFeeParameters`
    --> lightning/src/routing/scoring.rs:2097:25
     |
2097 |             let default_params = ProbabilisticScoringFeeParameters::default();
     |                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ use of undeclared type `ProbabilisticScoringFeeParameters`
     |
help: consider importing this struct
     |
2060 +         use crate::routing::scoring::ProbabilisticScoringFeeParameters;
     |

warning: elided lifetime has a name
   --> lightning/src/sync/debug_sync.rs:348:65
    |
341 | impl<'a, T: 'a> LockTestExt<'a> for Mutex<T> {
    |      -- lifetime `'a` declared here
...
348 |     fn unsafe_well_ordered_double_lock_self(&'a self) -> MutexGuard<T> {
    |                                                                    ^ this elided lifetime gets resolved as `'a`
    |
    = note: `#[warn(elided_named_lifetimes)]` on by default

For more information about this error, try `rustc --explain E0433`.
warning: `lightning` (lib test) generated 1 warning
error: could not compile `lightning` (lib test) due to 2 previous errors; 1 warning emitted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
weekly goal Someone wants to land this this week
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants