Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to Reduce Flakiness of BoLD Virtual Block System Tests #2905

Merged
merged 3 commits into from
Jan 29, 2025

Conversation

rauljordan
Copy link
Contributor

@rauljordan rauljordan commented Jan 28, 2025

This PR attempts to reduce the ERROR log messages from bold system tests by using the standard l2stateprovider.ErrChainCatchingUp in the implementation of the bold state provider. Before this change, we would be seeing this with a ~20% occurrence

25-01-21T02:08:08.1359719Z ERROR[01-21|01:22:36.272] Could not submit latest assertion        err="could not get execution state at batch count 1 with parent block hash 0x0000000000000000000000000000000000000000000000000000000000000000: chain catching up" validatorName=HonestAsserter

The problem was that, in poster.go and sync.go in the bold/assertions package the l2stateprovider.ErrChainCatchingUp error is caught and logged at a lower severity, INFO, but the bold state provider implementation was not using that error type, opting instead to use its own.

This should have the effect of having a couple-hundred fewer ERROR-level log messages. But, it may not actually reduce the flakiness of the these challenge tests.

tsahee
tsahee previously approved these changes Jan 28, 2025
Copy link
Collaborator

@tsahee tsahee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

When the state provider isn't able to do its job because the chain is lagging
behind head, it's supposed to use a predefined error from the l2stateprovider
package. Instead, the nitro implemention of the provider was using a
redefinition of the error.

While this coding error easily accounts for why there were a bunch of ERROR logs
about the chain's not keeping up with the latest changes, it does not explain
why the test is sometimes flaky. I suspect that the original commits in this PR
were chasing the ERROR logs in the flaky tests, but that those were not the root
cause of the test failures.
@rauljordan rauljordan merged commit df5ca16 into master Jan 29, 2025
15 checks passed
@rauljordan rauljordan deleted the reduce-flakiness branch January 29, 2025 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants