-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fantom Sonic Mainnet Archive node gets corrupted DB #167
Comments
We had the same problem too |
Same issue with 1.2.1-d |
The error message implies your node cannot start because its database is corrupted. Probably it has crashed or was killed at some point and the systemd has restarted it automatically. The described message is produced by the following run, which fails to start, because the db is already corrupted. Can you provide logs from the original crash? It is necessary to understand whats happened. Thanks! |
It looks like an OOM kill.
This is from a freshly synced node started 2 days ago from scratch. This time I used Each parameter used for the service is tuned based on the server specs (128 GB RAM and 32 CPUs):
Any advice here? Should I set the values for the limit and the cache lower? And if so, what would be the suitable numbers for this spec? Thanks in advance! |
I'm sorry, running in container and got it flushed. But it means SIGTERM and if it did't stop in 10sec then SIGKILL. Is there any way to fix the corruption? Because unclean shutdowns happens even in production environments and having to wait multiple days for archive genesis to be processed is really painful... |
Just for reference, updated to Link to docs - https://docs.fantom.foundation/node/tutorials/sonic-client/run-an-api-node This was on a freshly installed machine, and the systemd service is configured to not restart automatically in case of any crash, and has a timeout set to 600 seconds (which should be more than enough for the service to stop gracefully) - the DB still gets corrupted. If I stop the service manually, using We brought up 4 nodes, they all crashed due to the same reason, but at different points in time. |
I did some testing and can confirm that SIGTERM and SIGKILL makes database corrupted. So the main questions stands:
|
@janzhanal Is there is docker image to run or you build your own?(didn't find docker image for Sonic chain, only for opera) |
Building my own. |
Hello all,
|
Would also really appreciate a way to recover a dirty state db. |
@janzhanal The |
Describe the bug
Fantom Sonic Mainnet Archive node gets corrupted DB.
To Reproduce
Steps to reproduce the behavior:
sonicd[288420]: failed to initialize the node: failed to make consensus engine: failed to open existing databases: dirty state: gossip: DE
Expected behavior
Node is able to sync properly, without getting its DB corrupted.
Desktop (please complete the following information):
Additional context
Not quite sure how to mitigate this. It's the second time we're running into such issues on two different nodes - changed the machines as well thinking it would be a local storage problem.
We are using systemd, here is the service file:
Any feedback is highly appreciated!
The text was updated successfully, but these errors were encountered: