Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sozo): auto chunk blocks fetching world data #2929

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

okhaimie-dev
Copy link

@okhaimie-dev okhaimie-dev commented Jan 20, 2025

Description

Related issue

#2825

Tests

  • Yes
  • No, because they aren't needed
  • No, because I need help

Added to documentation?

  • README.md
  • Dojo Book
  • No documentation needed

Checklist

  • I've formatted my code (scripts/prettier.sh, scripts/rust_fmt.sh, scripts/cairo_fmt.sh)
  • I've linted my code (scripts/clippy.sh, scripts/docs.sh)
  • I've commented my code
  • I've requested a review after addressing the comments

Summary by CodeRabbit

  • New Features

    • Improved event retrieval process with support for large block ranges.
    • Added chunked event processing to enhance performance and reliability.
  • Performance

    • Optimized event retrieval by implementing block range segmentation.
    • Introduced a maximum block range limit to prevent potential retrieval issues.

Copy link

coderabbitai bot commented Jan 20, 2025

Walkthrough

Ohayo, sensei! The changes in the events command focus on enhancing event retrieval efficiency by introducing a maximum block range of 50,000 blocks. The previous continuation_token field has been replaced with the max_block_range, allowing for chunked processing of events. The logic for determining from_block and to_block has been updated, enabling a looped approach to fetch events in manageable sizes, improving the overall event processing mechanism.

Changes

File Change Summary
bin/sozo/src/commands/events.rs - Added pub max_block_range: u64 field
- Removed pub continuation_token: Option<String> field
- Implemented chunked event retrieval logic based on max_block_range
crates/dojo/world/src/remote/events_to_remote.rs - Updated from_events method to fetch events in chunks
- Adjusted initialization of from_block and determination of to_block
- Enhanced continuation token handling with a nested loop for event fetching

Possibly related PRs

  • feat(sozo): add events command back and event better #2619: The changes in this PR also involve modifications to the events.rs file, specifically enhancing event retrieval logic and introducing chunked processing, which aligns with the chunked approach introduced in the main PR.
  • fix: remove dbg #2757: This PR modifies the from_events method in the same file as the main PR, focusing on event fetching and filtering logic, which is relevant to the changes made in the main PR regarding event retrieval in chunks.
  • feat(katana): forked events #2594: The changes in this PR implement a chunked fetching mechanism for events in the from_events method, indicating a shared focus on improving event retrieval efficiency, directly related to the main PR.

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad20dfc and b61c04d.

📒 Files selected for processing (1)
  • bin/sozo/src/commands/events.rs (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • bin/sozo/src/commands/events.rs

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
bin/sozo/src/commands/events.rs (3)

20-20: Consider making MAX_BLOCK_RANGE configurable

Defining MAX_BLOCK_RANGE as a constant is fine, but making it a configurable parameter would enhance flexibility. This allows users to adjust the block range without modifying the code, improving usability in different scenarios.


68-73: Simplify from_block assignment

The nested match statements for assigning from_block can be simplified since BlockId::Number(block) will always match BlockId::Number(num). Here's a streamlined version:

-let from_block = match self.from_block {
-    Some(block) => match BlockId::Number(block) {
-        BlockId::Number(num) => num,
-        _ => latest_block.saturating_sub(1000),
-    },
-    None => latest_block.saturating_sub(1000),
-};
+let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000));

75-80: Simplify to_block assignment

Similarly, the assignment of to_block can be simplified for clarity:

-let to_block = match self.to_block {
-    Some(block) => match BlockId::Number(block) {
-        BlockId::Number(num) => num,
-        _ => latest_block,
-    },
-    None => latest_block,
-};
+let to_block = self.to_block.unwrap_or(latest_block);
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2dd3ead and 5074ac5.

📒 Files selected for processing (1)
  • bin/sozo/src/commands/events.rs (2 hunks)
🔇 Additional comments (1)
bin/sozo/src/commands/events.rs (1)

85-131: Verify block chunking logic for off-by-one errors

Please verify that the calculation of chunk_end and the update of current_start correctly process all blocks without overlap or gaps. Specifically, ensure that all blocks from from_block to to_block are processed exactly once.

Here's a script to check the block ranges:

✅ Verification successful

Ohayo! Block chunking logic is implemented correctly sensei!

The verification confirms that:

  • All blocks are processed exactly once
  • No gaps or overlaps exist between chunks
  • Edge cases, including the final block, are handled properly
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for overlaps or gaps in block ranges.

START_BLOCK=1000
END_BLOCK=2000
MAX_BLOCK_RANGE=500

current_start=$START_BLOCK

while [ $current_start -le $END_BLOCK ]; do
  chunk_end=$((current_start + MAX_BLOCK_RANGE - 1))
  if [ $chunk_end -gt $END_BLOCK ]; then
    chunk_end=$END_BLOCK
  fi
  echo "Processing blocks from $current_start to $chunk_end"
  current_start=$((chunk_end + 1))
done

Length of output: 620

Comment on lines 126 to 131
if let Some(continuation_token) = res.continuation_token {
println!("Continuation token: {:?}", continuation_token);
println!("----------------------------------------------");
}

current_start = chunk_end + 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle continuation_token for paginated event retrieval

When retrieving events, if a continuation_token is returned, it indicates more events are available within the current block range. Currently, the code prints the continuation_token but doesn't use it to fetch additional events. To ensure all events are retrieved, loop until res.continuation_token is None, using the token in subsequent get_events calls.

Would you like me to provide a revised implementation that correctly handles the continuation_token?

@glihm glihm changed the title feat ( sozo ): removed world block feat(sozo): removed world block Jan 21, 2025
@glihm glihm changed the title feat(sozo): removed world block feat(sozo): auto chunk blocks fetching world data Jan 21, 2025
Copy link
Collaborator

@glihm glihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okhaimie-dev thanks for the work here. Also, I'll update the issue, since we have other places where the world_block is actually used.

@@ -17,6 +17,8 @@ use super::options::starknet::StarknetOptions;
use super::options::world::WorldOptions;
use crate::utils;

const MAX_BLOCK_RANGE: u64 = 50_000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment mentioning that it's an arbitrary value for now that seems to work (and we need to validate that too).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okhaimie-dev for the command here, could you add this as a parameter? Would be interesting to have this quickly accessible to test the ranges while working on event.

sozo events --max-block-range 5000

} else {
self.from_block.map(BlockId::Number)
};
// Get latest block number
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Get latest block number

Don't use comment when it's self explanatory. 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright

Comment on lines 67 to 71
let from_block = match self.from_block {
Some(block) => match BlockId::Number(block) {
BlockId::Number(num) => num,
_ => latest_block.saturating_sub(1000),
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep using the world_block from the config, since we still want to start at a later block to avoid indexing the whole blockchain from block 0.

BlockId::Number(num) => num,
_ => latest_block.saturating_sub(1000),
},
None => latest_block.saturating_sub(1000),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to have a reasonable default window to start fetching event when the event is used. And since you recommend to remove the world_block I figured starting from around 24hr from the latest block would be reasonable.

What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, so if there's a world block, we take it. If not, you would start 1000 blocks before the latest?

In this case, it may be better starting to 0 actually. Like so, the default behavior is something easy to grasp for user. WDYT?

Copy link
Author

@okhaimie-dev okhaimie-dev Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will be better to start from block 0 because of how long it will take if they are fetching events from mainnet or sepolia, but I see what you are saying. Maybe it will be worth to leave a note in the sozo event doc to let user know the default behavior of starting to fetch from blocks produced 24hrs prior if a from_block is not provided

#[derive(Debug, Args)]
pub struct EventsArgs {
    #[arg(help = "List of specific events to be filtered")]
    #[arg(value_delimiter = ',')]
    pub events: Option<Vec<String>>,

    #[arg(short, long)]
    #[arg(help = "Block number from where to look for events")]
    pub from_block: Option<u64>,

    #[arg(short, long)]
    #[arg(help = "Block number until where to look for events")]
    pub to_block: Option<u64>,

If not, I will go with your recommendation and switch it 0 instead.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
bin/sozo/src/commands/events.rs (1)

110-114: ⚠️ Potential issue

Handle continuation token for complete event retrieval.

The code prints the continuation token but doesn't use it to fetch remaining events within the current block range.

For complete event retrieval, we should process all events within each chunk before moving to the next block range. Would you like me to provide an implementation that correctly handles the continuation token?

🧹 Nitpick comments (1)
bin/sozo/src/commands/events.rs (1)

82-85: Ohayo sensei! Add retry mechanism for RPC calls.

The event fetching could fail due to transient network issues. Consider implementing a retry mechanism with exponential backoff for better reliability.

Here's a suggested implementation using tokio-retry:

+use tokio_retry::strategy::{ExponentialBackoff, jitter};
+use tokio_retry::Retry;
+
+            let retry_strategy = ExponentialBackoff::from_millis(100)
+                .map(jitter)
+                .take(3);
+
-                let res = provider
+                let res = Retry::spawn(retry_strategy, || provider
                     .get_events(event_filter, self.continuation_token.clone(), self.chunk_size)
-                    .await?;
+                    .boxed())
+                    .await?;
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 050c2ee and 77bee69.

📒 Files selected for processing (1)
  • bin/sozo/src/commands/events.rs (2 hunks)
🔇 Additional comments (1)
bin/sozo/src/commands/events.rs (1)

19-21: Ohayo sensei! Let's validate the MAX_BLOCK_RANGE through benchmarking.

The constant value of 50,000 blocks needs validation through benchmarking to ensure optimal performance across different network conditions and load scenarios.

Run this script to analyze historical event fetching performance:

✅ Verification successful

Ohayo sensei! The MAX_BLOCK_RANGE value of 50,000 is reasonable as a default.

After analyzing similar implementations in Katana RPC and Saya, this value appears to be a good balance between performance and node load. The chunking mechanism allows for efficient pagination while preventing overwhelming the node.

  • Consider making this configurable via CLI or config file for different deployment scenarios, similar to Saya's approach
  • Keep the "pending benchmarking" comment as it correctly indicates this value might need adjustment based on specific network conditions
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Analyze event fetching performance with different block ranges
# Test ranges: 10k, 25k, 50k, 75k, 100k blocks

for range in 10000 25000 50000 75000 100000; do
    echo "Testing block range: $range"
    time cargo run --bin sozo events \
        --from-block $(($(cat .block_number) - range)) \
        --to-block $(cat .block_number) 2>&1 | \
        grep -E "real|Continuation token"
    echo "----------------------------------------"
done

Length of output: 2357


Script:

#!/bin/bash
# Search for benchmark files and similar constants
echo "=== Searching for benchmark files ==="
fd bench
echo -e "\n=== Searching for similar block range constants ==="
rg "BLOCK_RANGE|block_range" -A 2
echo -e "\n=== Looking for event-related test files ==="
fd "test.*events"

Length of output: 5128


Script:

#!/bin/bash
# Check contents of benchmark files
echo "=== Checking sozo benchmarks ==="
cat bin/sozo/benches/* 2>/dev/null || echo "No benchmark files found"
echo -e "\n=== Checking if events.rs has tests ==="
find bin/sozo/src -type f -name "*test*.rs" -exec grep -l "events" {} \;

Length of output: 1881

Comment on lines 65 to 68
let latest_block = provider.block_number().await?;
let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000));
let to_block = self.to_block.unwrap_or(latest_block);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider world_block configuration for initial block range.

Based on previous discussions, we should consider using world_block from the configuration when available, before falling back to the 1000-block window. This ensures consistency with the world configuration while maintaining the reasonable default window.

Apply this diff to incorporate world_block:

-            let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000));
+            let from_block = self.from_block.unwrap_or_else(|| {
+                world_diff.world_info
+                    .world_block
+                    .unwrap_or_else(|| latest_block.saturating_sub(1000))
+            });

Committable suggestion skipped: line range outside the PR's diff.

Copy link

codecov bot commented Jan 22, 2025

Codecov Report

Attention: Patch coverage is 32.43243% with 50 lines in your changes missing coverage. Please review.

Project coverage is 57.12%. Comparing base (687334d) to head (ad20dfc).
Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
bin/sozo/src/commands/events.rs 0.00% 44 Missing ⚠️
crates/dojo/world/src/remote/events_to_remote.rs 80.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2929      +/-   ##
==========================================
- Coverage   57.13%   57.12%   -0.02%     
==========================================
  Files         424      424              
  Lines       56198    56227      +29     
==========================================
+ Hits        32107    32117      +10     
- Misses      24091    24110      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

keys,
};
let latest_block = provider.block_number().await?;
let from_block = self.from_block.unwrap_or_else(|| latest_block.saturating_sub(1000));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As edited in the issue, please consider using the world_block from the config, if it exists. 👍

Comment on lines 110 to 115
if let Some(continuation_token) = res.continuation_token {
println!("Continuation token: {:?}", continuation_token);
println!("----------------------------------------------");
}

current_start = chunk_end + 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't just continue here, since the continuation token will indicate that there are more events to process. However in the new logic, since we're not stopping, those events would be lost.

The logic here could change to automatically fetch all the pages (as we do in the world_from_events function), and then display all the events.
Doing so, the continuation token is no more required to be exposed to the user. 👍

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
crates/dojo/world/src/remote/events_to_remote.rs (3)

69-71: Ohayo sensei! Consider using a configurable approach.

The MAX_BLOCK_RANGE value is fixed at 50,000, but the code already hints at a pending benchmarking effort. Making it configurable through an environment variable or a command-line argument can help future-proof performance tuning without code changes.


74-76: Ohayo sensei! Watch out for large data fetches.

Defaulting from_block to 0 can trigger massive queries (e.g., on mainnet). Consider letting users or config specify a safe default range to reduce wait times.


79-112: Nice chunked retrieval logic, sensei!

The double loop handles pagination well, ensuring no events are missed. However, storing all events in memory (events.extend) might be risky for large ranges—some form of batching or streaming to a storage solution could be considered.

bin/sozo/src/commands/events.rs (2)

20-23: Ohayo sensei! Keep the comment short and sweet.

The TODO already states the benchmark goal; consider removing or reducing the preceding comment lines to avoid redundancy:

-// Maximum blocks per query
-// TODO: initial value pending benchmarking to determine optimal range
+// TODO: Benchmark the optimal max block range

68-125: Leverage a fallback default range, sensei!

You currently require from_block if world_block is absent. For streamlined usage, a fallback default (e.g., latest_block - 1000) could be less error-prone for casual usage than forcing an explicit parameter.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70532e8 and ad20dfc.

📒 Files selected for processing (2)
  • bin/sozo/src/commands/events.rs (2 hunks)
  • crates/dojo/world/src/remote/events_to_remote.rs (1 hunks)

Copy link
Collaborator

@glihm glihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @okhaimie-dev and sorry for the delay.

The logic is good for events_to_remote, some comments to ensure the sozo events command will still work. 👍

@@ -17,6 +17,8 @@ use super::options::starknet::StarknetOptions;
use super::options::world::WorldOptions;
use crate::utils;

const MAX_BLOCK_RANGE: u64 = 50_000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okhaimie-dev for the command here, could you add this as a parameter? Would be interesting to have this quickly accessible to test the ranges while working on event.

sozo events --max-block-range 5000

to_block,
address: Some(world_diff.world_info.address),
keys,
self.from_block.expect("from_block is required when not specified in environment")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid expect here, since this command returns a Result, you could use anyhow to bail an error that will more integrated in the output.

Comment on lines 120 to 122
if let Some(token) = res.continuation_token {
continuation_token = Some(token);
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need the same logic as we have in the events_to_remote. Because currently you are not fetching all the pages for the given EventFilter. Hence we will loose some events.

You will need to loop while the continuation token is None. Then you can update the current_start.

* fix: implimented glihm's feedback

* fix: added short for max_block_range
@okhaimie-dev okhaimie-dev requested a review from glihm February 14, 2025 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants