Prioritize removing clients over validators in the heartbeat logic #3504

kaimast · 2025-02-26T00:14:00Z

Motivation

This pull request fixes #3479 and cleans up the heartbeat logic a little. The logic now prioritizes disconnecting nodes over validators to improve connectivity between validators and ensure new blocks propagate to clients.

Test Plan

After talking to Victor I decided against adding tests for this PR and believe we can rely on code review to ensure correctness of the changes. The code can only affect liveness of the protocol, not safety, and tests would add lot of additional code that might need to be refactored later.

Notes

There are few things that might need to be improved in the future.

remove_oldest_connected_peerdoes not actually remove the peer with the oldest TCP connection, but the one that we have not received any message from in the longest time.
Some parts of the code use "ip" to refer to a socket address (which is a tuple of IP and network port). I already changed some variable names from peer_ip to peer_addr, to address this.
Virtually all the logic resides in the trait declaration right now. We should either actually specialize the different implementations, or simplify the code by converting Heartbeat to a struct.

node/router/src/heartbeat.rs

vicsn · 2025-02-27T14:04:17Z

Logic LGTM! Approving once a deployed devnet passes with this code.

node/router/src/heartbeat.rs

niklaslong · 2025-03-03T12:20:13Z

node/router/src/heartbeat.rs

+        // Disconnect from the oldest connected peer, which is the first entry in the list
+        // of removable peers.
+        // Do nothing, if the list is empty.
+        if let Some(oldest) = self.get_removable_peers().into_iter().map(|peer| peer.ip()).next() {


Really appreciate the addition of the comment here! I don't mean to nit but this might be a little more idiomatic:

Suggested change

if let Some(oldest) = self.get_removable_peers().into_iter().map(|peer| peer.ip()).next() {

if let Some(oldest) = self.get_removable_peers().first().map(Peer::ip) {

I might also suggest we add a unit test for get_removable_peers as getting the order wrong there would lead to a more unstable network? I'm not sure how involved test writing might be here, we could perhaps include this in future improvements to our test coverage, wdyt?

Well now you're just showing off! 😛

And yes since get_removable_peers peers is not influenced by timeouts, it can be unit tested.

niklaslong · 2025-03-03T12:59:42Z

node/router/src/heartbeat.rs

                // Disconnect from this peer.
-                self.router().disconnect(peer_ip);
+                self.router().disconnect(peer_addr);


I think peer_ip here was correct? disconnect internally calls resolve_to_ambiguous which maps the listener to the ephemeral address (ip + port).

You might have noticed this already but just in case, the convention is:

peer_ip (granted, a slight misnomer): the peer's listener SocketAddr (ip + port)

peer_addr: the peer's ephemeral SocketAddr (ip + port, though only the port should be different from the listener)

Is this naming convention documented anywhere? If not, where would be a good place?

niklaslong · 2025-03-03T13:04:30Z

node/router/src/heartbeat.rs

@@ -290,11 +297,11 @@ pub trait Heartbeat<N: Network>: Outbound<N> {
    /// This function attempts to connect to any disconnected trusted peers.
    fn handle_trusted_peers(&self) {
        // Ensure that the trusted nodes are connected.
-        for peer_ip in self.router().trusted_peers() {
+        for peer_addr in self.router().trusted_peers() {


This should also be peer_ip as iirc they are the pre-configured trusted listener addresses the nodes can connect to.

niklaslong

Could you double check the renaming of peer_ip to peer_addr? In most collections we track the (rather unfortunately named) "peer_ip" aka the listener address (ip + port) instead of the ephemeral address when a connection is received 🙏

See:

raychu86 · 2025-03-03T23:05:51Z

node/router/src/heartbeat.rs

+            .filter(|peer| {
+                !trusted.contains(&peer.ip()) // Always keep trusted nodes
+                  && !bootstrap.contains(&peer.ip()) // Always keep bootstrap nodes
+                  && !self.router().cache.contains_inbound_block_request(&peer.ip()) // This peer is currently syncing from us


Can this be exploited in a way where a malicious validator just needs to keep sending block requests to prevent the peer disconnects?

Yes, same for malicious clients, this was already an issue before this PR.

@kaimast want to create a new issue for this? A simple solution as noted here would be to refresh a random selection of syncing peers under some conditions, I bet with further analysis we can avoid hurting sync performance of honest nodes too much.

Kai Mast added 2 commits February 25, 2025 13:46

Prefer disconnecting from clients over validators

0c71ea8

Improve comments

5535901

kaimast requested a review from vicsn February 26, 2025 00:14

Merge remote-tracking branch 'origin/staging' into peer-priority

ed57d9f

kaimast commented Feb 26, 2025

View reviewed changes

node/router/src/heartbeat.rs Show resolved Hide resolved

Remove obsolete TODO

fad560b

kaimast commented Feb 26, 2025

View reviewed changes

node/router/src/heartbeat.rs Show resolved Hide resolved

kaimast changed the title ~~Peer priority~~ Prioritize removing clients over validators in the heartbeat logic Feb 26, 2025

niklaslong reviewed Feb 26, 2025

View reviewed changes

node/router/src/heartbeat.rs Outdated Show resolved Hide resolved

node/router/src/heartbeat.rs Outdated Show resolved Hide resolved

node/router/src/heartbeat.rs Outdated Show resolved Hide resolved

vicsn requested changes Feb 26, 2025

View reviewed changes

node/router/src/heartbeat.rs Outdated Show resolved Hide resolved

node/router/src/heartbeat.rs Outdated Show resolved Hide resolved

node/router/src/heartbeat.rs Outdated Show resolved Hide resolved

node/router/src/heartbeat.rs Outdated Show resolved Hide resolved

Kai Mast added 3 commits February 26, 2025 10:58

Simplify sorting

f2bfd38

Address code review comments

2ab07ec

Improve handling of provers in heartbeat

57d6d67

kaimast requested a review from vicsn February 26, 2025 22:04

niklaslong reviewed Feb 28, 2025

View reviewed changes

node/router/src/heartbeat.rs Outdated Show resolved Hide resolved

vicsn added the v3.5.0 label Feb 28, 2025

Actually remove the oldest peer

2afe8b6

niklaslong reviewed Mar 3, 2025

View reviewed changes

niklaslong requested changes Mar 3, 2025

View reviewed changes

Merge remote-tracking branch 'origin/staging' into peer-priority

832b2dc

raychu86 reviewed Mar 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prioritize removing clients over validators in the heartbeat logic #3504

Prioritize removing clients over validators in the heartbeat logic #3504

kaimast commented Feb 26, 2025

vicsn commented Feb 27, 2025

niklaslong Mar 3, 2025

vicsn Mar 3, 2025

niklaslong Mar 3, 2025 •

edited

Loading

vicsn Mar 6, 2025 •

edited

Loading

niklaslong Mar 3, 2025

niklaslong left a comment •

edited

Loading

raychu86 Mar 3, 2025 •

edited

Loading

vicsn Mar 6, 2025 •

edited

Loading

	if let Some(oldest) = self.get_removable_peers().into_iter().map(\|peer\| peer.ip()).next() {
	if let Some(oldest) = self.get_removable_peers().first().map(Peer::ip) {

Prioritize removing clients over validators in the heartbeat logic #3504

Are you sure you want to change the base?

Prioritize removing clients over validators in the heartbeat logic #3504

Conversation

kaimast commented Feb 26, 2025

Motivation

Test Plan

Notes

vicsn commented Feb 27, 2025

niklaslong Mar 3, 2025

Choose a reason for hiding this comment

vicsn Mar 3, 2025

Choose a reason for hiding this comment

niklaslong Mar 3, 2025 • edited Loading

Choose a reason for hiding this comment

vicsn Mar 6, 2025 • edited Loading

Choose a reason for hiding this comment

niklaslong Mar 3, 2025

Choose a reason for hiding this comment

niklaslong left a comment • edited Loading

Choose a reason for hiding this comment

raychu86 Mar 3, 2025 • edited Loading

Choose a reason for hiding this comment

vicsn Mar 6, 2025 • edited Loading

Choose a reason for hiding this comment

niklaslong Mar 3, 2025 •

edited

Loading

vicsn Mar 6, 2025 •

edited

Loading

niklaslong left a comment •

edited

Loading

raychu86 Mar 3, 2025 •

edited

Loading

vicsn Mar 6, 2025 •

edited

Loading