Delayed RPC Send Using Tokens #5923

ackintosh · 2024-06-13T13:50:14Z

Issue Addressed

Proposed Changes

The diagram below shows the differences in how the receiver (responder) behaves before and after this PR. The following sentences will detail the changes.

flowchart TD

subgraph "*** After ***"
    Start2([START]) --> AA[Receive request]
    AA --> COND1{Is there already an active request <br> with the same protocol?}
    COND1 --> |Yes| CC[Send error response]
    CC --> End2([END])
    %% COND1 --> |No| COND2{Request is too large?}
    %% COND2 --> |Yes| CC
    COND1 --> |No| DD[Process request]
    DD --> EE{Rate limit reached?}
    EE --> |Yes| FF[Wait until tokens are regenerated]
    FF --> EE
    EE --> |No| GG[Send response]
    GG --> End2
end

subgraph "*** Before ***"
    Start([START]) --> A[Receive request]
    A --> B{Rate limit reached <br> or <br> request is too large?}
    B -->|Yes| C[Send error response]
    C --> End([END])
    B -->|No| E[Process request]
    E --> F[Send response]
    F --> End
end

`Is there already an active request with the same protocol?`

This check is not performed in Before. This is taken from the PR in the consensus-spec, which proposes updates regarding rate limiting and response timeout.
https://github.com/ethereum/consensus-specs/pull/3767/files

The requester MUST NOT make more than two concurrent requests with the same ID.

The PR mentions the requester side. In this PR, I introduced the ActiveRequestsLimiter for the responder side to restrict more than two requests from running simultaneously on the same protocol per peer. If the limiter disallows a request, the responder sends a rate-limited error and penalizes the requester.

`Rate limit reached?` and `Wait until tokens are regenerated`

UPDATE: I moved the limiter logic to the behaviour side. #5923 (comment)

The rate limiter is shared between the behaviour and the handler. (Arc<Mutex<RateLimiter>>>) The handler checks the rate limit and queues the response if the limit is reached. The behaviour handles pruning.

I considered not sharing the rate limiter between the behaviour and the handler, and performing all of these either within the behaviour or handler. However, I decided against this for the following reasons:

Regarding performing everything within the behaviour: The behaviour is unable to recognize the response protocol when RPC::send_response() is called, especially when the response is RPCCodedResponse::Error. Therefore, the behaviour can't rate limit responses based on the response protocol.
Regarding performing everything within the handler: When multiple connections are established with a peer, there could be multiple handlers interacting with that peer. Thus, we cannot enforce rate limiting per peer solely within the handler. (Any ideas? 🤔 )

Additional Info

Naming

I have renamed the fields of the behaviour to make them more intuitive:

limiter -> response_limiter
self_limiter -> outbound_request_limiter

Testing

I have run beacon node with this changes for 24hours, it looks work fine.

The rate-limited error has not occurred anymore while running this branch.

# Conflicts: # beacon_node/lighthouse_network/src/rpc/handler.rs

…me protocol per peer

# Conflicts: # beacon_node/lighthouse_network/tests/rpc_tests.rs

…tial timing issues

ackintosh · 2024-12-02T22:14:04Z

https://github.com/sigp/lighthouse/actions/runs/12109786536/job/33759312718?pr=5923

The network-tests failure was not reproduced locally. It might be related to #6646.

ackintosh · 2024-12-09T22:51:58Z

@jxs @pawanjay176 @AgeManning

This is ready for another review. 🙏

I have added a concurrency limit on the self-lmiter. Now, the self-limiter limits outbound requests based on both the number of concurrent requests and tokens (optional). Whether we also need to limit tokens in the self-limiter is still under duscussion. Let me know if you have any ideas.

(FYI)

I also ran lighthouse (this branch) on the testnet for ~24hours. During this time, the LH node responded with 21 RateLimited errors due to the number of active requests. These errors appear in the logs like the example below. Note that this is about inbound requests, not the self-limiter (outbound requests).

Dec 09 13:38:56.806 DEBG There is an active request with the same protocol, protocol: beacon_blocks_by_range, request: Blocks by range: Start Slot: 2738468, Count: 64, Step: 1, peer_id: 16Uiu2HAmERvtCC321A2Nu1pH6QmhXgvqALxgCnrp4qxr2qeVWr2P, service: libp2p_rpc, service: libp2p, module: lighthouse_network::rpc:491

# Conflicts: # beacon_node/lighthouse_network/src/rpc/mod.rs # beacon_node/lighthouse_network/src/rpc/self_limiter.rs

AgeManning · 2025-02-10T23:01:30Z

@pawanjay176 @jxs @dapplion @jimmygchen - If anyone has any spare time, I think this is a good one to get in.

because that such invalid requests are handled in the Handler since the following PRs: sigp#6847 sigp#6986

ackintosh · 2025-02-14T22:14:23Z

I removed RPC::is_request_size_too_large in cfea9d2 because such invalid requests are now handled in the Handler, as implemented in the following PRs:

# Conflicts: # beacon_node/lighthouse_network/src/rpc/mod.rs # beacon_node/lighthouse_network/src/rpc/self_limiter.rs

dapplion · 2025-02-25T19:12:20Z

beacon_node/lighthouse_network/src/rpc/mod.rs

 mod self_limiter;

 static NEXT_REQUEST_ID: AtomicUsize = AtomicUsize::new(1);

+// Maximum number of concurrent requests per protocol ID that a client may issue.
+const MAX_CONCURRENT_REQUESTS: usize = 2;


This means we can have at most two blocks_by_range ReqResp requests per peer?

dapplion · 2025-02-25T19:38:39Z

beacon_node/lighthouse_network/src/rpc/mod.rs

+                        id,
+                        RpcResponse::Error(
+                            RpcErrorResponse::RateLimited,
+                            "Rate limited. There is an active request with the same protocol"


Suggested change

"Rate limited. There is an active request with the same protocol"

format!("Rate limited. There are already {MAX_CONCURRENT_REQUESTS} active requests with the same protocol")

dapplion

Overall architecture looks good! Just some comments.

Could you add a metric to track the time our own requests are idling in the self-rate limiter? It will help inform is sync performance is hindered by this new rate limit policy. We should also track the time outbound responses are idling in the rate limiter.

dapplion · 2025-02-25T19:50:25Z

beacon_node/lighthouse_network/src/rpc/response_limiter.rs

+                    peer_id,
+                    connection_id,
+                    substream_id,
+                    response,


We consider the memory cost of storing responses too low to worry? The worst case is buffering 2 x data_columns_by_range for all possible indices for all peers. For 9 blobs per block, that's 131072*2 * 9 * 100 / 1e6 = 235 MB

Is the point of this PR to not have an additional global rate limiter? It would help to reduce the worst case scenario

dapplion · 2025-02-25T19:59:02Z

beacon_node/lighthouse_network/src/rpc/self_limiter.rs

@@ -139,8 +178,14 @@ impl<Id: ReqId, E: EthSpec> SelfRateLimiter<Id, E> {
        if let Entry::Occupied(mut entry) = self.delayed_requests.entry((peer_id, protocol)) {
            let queued_requests = entry.get_mut();
            while let Some(QueuedRequest { req, request_id }) = queued_requests.pop_front() {


If you remove a QueuedRequest item from queued_requests and the limiter doesn't allow to send, is this request lost?

ackintosh added 6 commits June 10, 2024 07:07

trickle responses

0154359

pruning

d5fe64e

cargo fmt

aab59f5

Test that the receiver delays the responses

e00e679

Add doc comments

670ec96

Fix typo

c0ae632

ackintosh added work-in-progress PR is a work-in-progress Networking skip-ci Don't run the `test-suite` labels Jun 13, 2024

ackintosh force-pushed the delayed-rpc-response branch from 29e3f00 to 90361d6 Compare June 19, 2024 22:04

Add inbound request size limiter

7e0c630

ackintosh force-pushed the delayed-rpc-response branch from 90361d6 to 7e0c630 Compare June 19, 2024 22:44

ackintosh added 10 commits June 21, 2024 07:20

Merge branch 'refs/heads/unstable' into delayed-rpc-response

6322210

# Conflicts: # beacon_node/lighthouse_network/src/rpc/handler.rs

Fix compile error

3947bf6

Add doc comment and rename

933dc00

Extract a function that calculates tau and t from the quota

b62537f

unwrap

86cf8fb

Remove unused limiter

8fd37c5

Restrict more than two requests from running simultaneously on the sa…

6c1015e

…me protocol per peer

Rename from self_limiter to outbound_request_limiter

817ce97

Fix clippy errors

7e42568

Merge branch 'refs/heads/unstable' into delayed-rpc-response

94c2493

# Conflicts: # beacon_node/lighthouse_network/tests/rpc_tests.rs

ackintosh removed the skip-ci Don't run the `test-suite` label Jul 1, 2024

ackintosh added 6 commits July 1, 2024 23:01

Fix import

9ad4eb7

Fix clippy errors

de9d943

Merge branch 'refs/heads/unstable' into delayed-rpc-response

7adb142

Merge branch 'refs/heads/unstable' into delayed-rpc-response

627fd33

# Conflicts: # beacon_node/lighthouse_network/tests/rpc_tests.rs

Update request_id with AppRequestId

b55ffca

Merge branch 'unstable' into delayed-rpc-response

73e9879

ackintosh marked this pull request as ready for review July 14, 2024 23:23

ackintosh added 2 commits November 27, 2024 07:55

Remove active requests belonging to the peer that disconnected

9d2b263

Fix unused variable error

2d7a679

ackintosh force-pushed the delayed-rpc-response branch from fe458ac to 2d7a679 Compare November 27, 2024 21:20

Fix clippy errors

b73a336

ackintosh force-pushed the delayed-rpc-response branch from aa57e8b to b73a336 Compare November 29, 2024 22:16

ackintosh added 4 commits December 1, 2024 16:57

Merge branch 'unstable' into delayed-rpc-response

dfd092d

Fix clippy errors

810c5de

Update test

d46cbe8

Adding a slight margin to the elapsed time check to account for poten…

540436c

…tial timing issues

ackintosh added 2 commits December 5, 2024 07:51

Merge branch 'unstable' into delayed-rpc-response

3d39f2c

Remove an active request when it ends with an error

60c9900

Merge branch 'unstable' into delayed-rpc-response

eec6b4a

ackintosh added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Dec 12, 2024

ackintosh added 3 commits February 7, 2025 07:54

Merge branch 'unstable' into delayed-rpc-response

9a6eb72

# Conflicts: # beacon_node/lighthouse_network/src/rpc/mod.rs # beacon_node/lighthouse_network/src/rpc/self_limiter.rs

Sync with further updates from unstable

0d0f48d

Sync with further updates from unstable

ce7ae4b

AgeManning requested review from jimmygchen, pawanjay176 and jxs February 10, 2025 23:01

Remove RPC::is_request_size_too_large

cfea9d2

because that such invalid requests are handled in the Handler since the following PRs: sigp#6847 sigp#6986

ackintosh added 2 commits February 17, 2025 07:24

Merge branch 'unstable' into delayed-rpc-response

9850bce

Merge branch 'unstable' into delayed-rpc-response

d859dad

# Conflicts: # beacon_node/lighthouse_network/src/rpc/mod.rs # beacon_node/lighthouse_network/src/rpc/self_limiter.rs

dapplion reviewed Feb 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delayed RPC Send Using Tokens #5923

Delayed RPC Send Using Tokens #5923

ackintosh commented Jun 13, 2024 •

edited

Loading

ackintosh commented Dec 2, 2024

ackintosh commented Dec 9, 2024

AgeManning commented Feb 10, 2025 •

edited

Loading

ackintosh commented Feb 14, 2025

dapplion Feb 25, 2025

dapplion Feb 25, 2025

dapplion left a comment

dapplion Feb 25, 2025

dapplion Feb 25, 2025

dapplion Feb 25, 2025

	"Rate limited. There is an active request with the same protocol"
	format!("Rate limited. There are already {MAX_CONCURRENT_REQUESTS} active requests with the same protocol")

Delayed RPC Send Using Tokens #5923

Are you sure you want to change the base?

Delayed RPC Send Using Tokens #5923

Conversation

ackintosh commented Jun 13, 2024 • edited Loading

Issue Addressed

Proposed Changes

Is there already an active request with the same protocol?

Rate limit reached? and Wait until tokens are regenerated

Additional Info

Naming

Testing

ackintosh commented Dec 2, 2024

ackintosh commented Dec 9, 2024

AgeManning commented Feb 10, 2025 • edited Loading

ackintosh commented Feb 14, 2025

dapplion Feb 25, 2025

Choose a reason for hiding this comment

dapplion Feb 25, 2025

Choose a reason for hiding this comment

dapplion left a comment

Choose a reason for hiding this comment

dapplion Feb 25, 2025

Choose a reason for hiding this comment

dapplion Feb 25, 2025

Choose a reason for hiding this comment

dapplion Feb 25, 2025

Choose a reason for hiding this comment

ackintosh commented Jun 13, 2024 •

edited

Loading

`Is there already an active request with the same protocol?`

`Rate limit reached?` and `Wait until tokens are regenerated`

AgeManning commented Feb 10, 2025 •

edited

Loading