Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(zksync_cli): Health checkpoint improvements #3193

Open
wants to merge 55 commits into
base: main
Choose a base branch
from

Conversation

manuelmauro
Copy link
Contributor

@manuelmauro manuelmauro commented Oct 29, 2024

What ❔

Add three new components to node's healthcheck:

  • General (i.e., version, last migration)
  • State Keeper
  • Eth Sender

Why ❔

Checklist

  • PR title corresponds to the body of PR (we generate changelog entries from PRs).
  • Tests for the changes have been added / updated.
  • Documentation comments have been added / updated.
  • Code has been formatted via zkstack dev fmt and zkstack dev lint.

@manuelmauro manuelmauro marked this pull request as ready for review November 4, 2024 14:30
core/lib/dal/src/system_dal.rs Outdated Show resolved Hide resolved
core/lib/git_version_macro/src/lib.rs Outdated Show resolved Hide resolved
core/node/house_keeper/src/database.rs Outdated Show resolved Hide resolved
core/node/house_keeper/src/eth_sender.rs Outdated Show resolved Hide resolved
core/node/house_keeper/src/eth_sender.rs Outdated Show resolved Hide resolved
etc/env/file_based/general.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@slowli slowli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍 Still have a couple of comments, but overall the PR looks significantly better than previously.

core/node/house_keeper/src/version.rs Outdated Show resolved Hide resolved
core/node/house_keeper/src/database.rs Outdated Show resolved Hide resolved
core/node/state_keeper/src/keeper.rs Outdated Show resolved Hide resolved
core/node/state_keeper/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines +10 to +11
pub prev_l2_block_hash: H256,
pub prev_l2_block_timestamp: u64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshedding: Do these 2 fields provide valuable info? IMO, it'd make more sense to provide info like the number of txs in the current block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the number of txs in the current block

I don't feel this info is somehow interesting. We have normally 2-3 tps so we have to update it pretty frequent

Copy link
Contributor

@Deniallugo Deniallugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but wanna approve from @slowli

Copy link
Contributor

@slowli slowli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. If this functionality isn't urgent, I'd still prefer at least non-nit / bikeshedding comments be addressed in this PR.

pub installed_on: DateTime<chrono::Utc>,
pub success: bool,
pub checksum: String,
pub execution_time: i64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The type of this field looks weird. If it's a duration, can it be converted to std::time::Duration? Alternatively, it may make sense to add a suffix (like _ms) to the field name and convert it correspondingly.

#[derive(Debug, Serialize, Deserialize)]
pub struct EthTxManagerHealthDetails {
pub last_mined_tx: EthTxDetails,
pub tx_status: TxStatus,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that last_mined_tx_status? If so, please rename the field (or combine it with last_mined_tx by adding). Also, just to be clear: Is it intentional the health for EthTxManager is updated in a loop for each transaction in inflight_txs? Is the ordering of inflight_txs correct to talk about "last mined transaction"?

const SCRAPE_INTERVAL: Duration = Duration::from_secs(60);

#[derive(Debug, Deserialize, Clone, PartialEq)]
pub struct Config {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config looks redundant; it's not used and is always initialized to a default value. Also, you can probably reuse SCRAPE_INTERVAL for the new task.

Comment on lines +10 to +11
pub use values::BIN_METADATA;
use zksync_health_check::{CheckHealth, Health, HealthStatus};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please don't split uses with mod declarations; it'll lead to bogus import grouping (like here zksync_health_check belongs to external imports).
  • Please prepend self:: for internal imports so that they are easy to distinguish from external ones.

/// Tries to run the command, only returns `Some` if the command
/// succeeded and the output was valid utf8.
fn run_cmd(cmd: &str, args: &[&str]) -> String {
run_cmd_opt(cmd, args).unwrap_or("unknown".to_string())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshedding: Is "unknown" a better value than no value at all (i.e., None)? IMO, the latter is more idiomatic, and it fits better at least for metrics.


impl From<&BinMetadata> for Health {
fn from(details: &BinMetadata) -> Self {
Self::from(HealthStatus::Ready).with_details(details)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future work (?): It's weird seeing this reported as a component, since this data clearly doesn't have its own lifecycle (not even a sort of surrogate one like Postgres does). It'd probably make sense to add a way to set details for AppHealth (note that it contains Health already).

Comment on lines +114 to +115
self.health_updater
.update(Health::from(HealthStatus::Ready));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Why not update health from cursor immediately, like you do below?
  • To be really nitpicky, one could argue that the state keeper is affected until it processes the pending batch, since it doesn't respond to environment signals (mempool txs and timeouts) until then. This is subjective though, so feel free to ignore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants