P2P controller issues #25

eldargab · 2024-11-12T15:18:53Z

1

worker-rs/src/controller/p2p.rs

Line 156 in d26c626

if !self.logs_storage.is_initialized() {

And the query is silently dropped? No good!

2

worker-rs/src/controller/p2p.rs

Line 234 in d26c626

.unwrap_or_else(|_| error!("Cannot send query result: queue full"));

It is strange to ignore back pressure and to continue accept queries while there are troubles with sending them back!

Similar thing happens here:

worker-rs/src/controller/p2p.rs

Line 139 in d26c626

warn!("Queries queue is full. Dropping query from {peer_id}");

When application is not able to process requests it should convey that to the transport level and to stop wasting resources on accepting and verifying packets that it is about to drop.

However, the problem is not just about queue puts.

I would implement request processing pipeline roughly as follows.

No queue in the message receiving loop
CU and MAX_PENDING_QUERIES limits are checked and the error is returned to the user immediately if they where exceeded (with await on response queue).
Query processing procedure would be launched with tokio::spawn() and would include
1. Parsing
2. Query execution
3. Log record formulation and writing
4. Send queue put with await.

3

worker-rs/src/controller/worker.rs

Line 104 in 17bbf99

query_str: String,

No need for ownership.

4

max query size limit in the currently linked version of the transport lib is set to 512 kb.

pub const MAX_QUERY_SIZE: u64 = 512 * 1024;

It should be less.

The limit for the query itself should be set exactly and explicitly to 256 kb.

Transport message size should be adjusted accordingly.

For the future, allocation check should happen before message arrival and validation.

The text was updated successfully, but these errors were encountered:

kalabukdima · 2024-11-13T06:02:23Z

It is the code left from the old logs collection approach. It's fixed in Implement pull-based logs collection #23
- The first point is about the TransportHandle. Yes, it has a poor design and it caused a lot of trouble in the portal. I'm trying another approach in the logs collector and if it works well, I'll do the same with all other actors and get back with the results. Just note that this queue only sends messages to an internal coroutine that puts them into another queue. So it's even worse — if we have troubles sending results back, the worker's code won't even know about it.
- Regarding the event processing, I believe it was your suggestion to not block in the event handling procedure and process it as fast as possible. Do you suggest blocking on sending an error response now? But then problems with queries will prevent other transport messages (like logs requests) from being processed.
Good point!
Restricting queries to 256 kB is something we've agreed on just recently. We're not even sure yet that it would be enough. Do you think allocating 2x space is an issue? The Vec implementation itself assumes it's fine to use 2x memory, so we should also go through all Vec usages and reserve the capacity in advance if this is the goal.
For the query string itself, I'll add the explicit limit.

kalabukdima mentioned this issue Jan 20, 2025

Transport layer hardening subsquid/sqd-network#138

Open

7 tasks

kalabukdima self-assigned this Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P2P controller issues #25

P2P controller issues #25

eldargab commented Nov 12, 2024

kalabukdima commented Nov 13, 2024

P2P controller issues #25

P2P controller issues #25

Comments

eldargab commented Nov 12, 2024

1

2

3

4

kalabukdima commented Nov 13, 2024