Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log segment header when archiver instantiation fails #3378

Merged
merged 2 commits into from
Feb 11, 2025
Merged

Conversation

teor2345
Copy link
Member

We had a report of archiver errors on a timekeeper:

I noticed a timekeeper in an endless restart loop and always throwing this panic
archival-node-1 | 2025-02-07T17:05:49.143589Z INFO Consensus: sc_consensus_subspace::archiver: Resuming archiver from last archived block last_archived_block_number=2358
archival-node-1 | thread 'tokio-runtime-worker' panicked at /code/crates/sc-consensus-subspace/src/archiver.rs:713:14:
archival-node-1 | Incorrect parameters for archiver: InvalidBlockSmallSize { block_bytes: 874, archived_block_bytes: 1751094 }
archival-node-1 | note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Things i did:

  • I switched cores for the timekeeper thread to run on (no luck),
  • I then removed the timekeeper flags and tried to start it but no luck.
  • I then removed the timekeeper flags and wiped the node and it started.

Looking at the code, there's nowhere that changes the segment header (last archived block) or block data. So this seems like corruption in the segment or block stores due to overclocking. This PR prints the segment header on archiver instantiation errors, to help diagnose similar errors in future.

This PR also changes an incorrect (but harmless) BlockNumber type in archiver instantiation errors to u32.

Code contributor checklist:

@teor2345 teor2345 added the improvement it is already working, but can be better label Feb 10, 2025
@teor2345 teor2345 self-assigned this Feb 10, 2025
@teor2345 teor2345 requested a review from nazar-pc as a code owner February 10, 2025 09:22
nazar-pc
nazar-pc previously approved these changes Feb 10, 2025
Comment on lines 697 to 699
.unwrap_or_else(|error| {
panic!("Incorrect parameters for archiver: {error:?} {last_segment_header:?}")
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but I suspect one of the historical reasons it was panicking is that function didn't return an error, but now it does, so might be a good thing to return it from here as well (in a simpler form, while using error!() macro to print last_segment_header).

Copy link
Member Author

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, this is ready for review again

@teor2345 teor2345 added this pull request to the merge queue Feb 11, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 11, 2025
@teor2345 teor2345 added this pull request to the merge queue Feb 11, 2025
@teor2345 teor2345 removed this pull request from the merge queue due to a manual request Feb 11, 2025
@teor2345 teor2345 added this pull request to the merge queue Feb 11, 2025
Merged via the queue into main with commit 12c2529 Feb 11, 2025
8 checks passed
@teor2345 teor2345 deleted the archiver-debug-2 branch February 11, 2025 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement it is already working, but can be better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants