Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

increase tendermint_p2p::secret_connection::DATA_MAX_SIZE #1356

Closed
mkaczanowski opened this issue Sep 18, 2023 · 5 comments
Closed

increase tendermint_p2p::secret_connection::DATA_MAX_SIZE #1356

mkaczanowski opened this issue Sep 18, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@mkaczanowski
Copy link
Contributor

What went wrong?

Link to the issue: iqlusioninc/tmkms#729 (comment)

TMKMS (signer) use:

tendermint_p2p::secret_connection::DATA_MAX_SIZE

to define the buffer size for incoming (signing / privval) requests. It happened that SEI network pushes the block proposals that are up to 6 times bigger than 1024 bytes (current value of DATA_MAX_SIZE) and that makes the tmkms to fail

@tony-iqlusion suggested this might be the issue of tendermint-p2p. I would like to understand if the value should be bumped, or maybe tmkms uses it incorrectly?

Steps to reproduce

  1. setup SEI validator node
  2. setup tmkms
  3. see tmkms fail on the underflow
  4. change the tmkms buffer size (tendermint_p2p::secret_connection::DATA_MAX_SIZE)
  5. see tmkms succeed

Definition of "done"

either the DATA_MAX_SIZE is bumped or there is a clear explanation of how and where things should be defined

@tony-iqlusion
Copy link
Collaborator

Perhaps we could raise DATA_MAX_SIZE to something much larger like 64kB.

I'm not sure how it was originally chosen, but it's not like the systems we're running it on are hurting for memory.

@qezz
Copy link

qezz commented Jan 19, 2024

Following up on iqlusioninc/tmkms#729 (comment)

Simply bumping the DATA_MAX_SIZE doesn't seem to help. Unix sock connection still works fine.

But TCP connection gets interrupted at

sc.read_exact(&mut buf)?;

with

Jan 17 17:25:06 tmkms[1191569]: 2024-01-17T17:25:06.025779Z  INFO tmkms::connection::tcp: KMS node ID: ...
Jan 17 17:25:08 tmkms[1191569]: 2024-01-17T17:25:08.899507Z ERROR tmkms::client: [atlantic-2@tcp://....:...] protocol error:
Jan 17 17:25:08 tmkms[1191569]:    0: io error
Jan 17 17:25:08 tmkms[1191569]:    1: Connection reset by peer (os error 104)
Jan 17 17:25:08 tmkms[1191569]:   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Jan 17 17:25:08 tmkms[1191569]:                                 ⋮ 5 frames hidden ⋮
Jan 17 17:25:08 tmkms[1191569]:    6: flex_error::tracer_impl::eyre::<impl flex_error::tracer::ErrorMessageTracer for eyre::Report>::new_message::h0b8460f0e4a979f4
Jan 17 17:25:08 tmkms[1191569]:       at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/flex-error-0.4.4/src/tracer_impl/eyre.rs:10
Jan 17 17:25:08 tmkms[1191569]:    7: <flex_error::source::DisplayOnly<E> as flex_error::source::ErrorSource<Tracer>>::error_details::h57d5877ceb76caa7
Jan 17 17:25:08 tmkms[1191569]:       at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/flex-error-0.4.4/src/source.rs:142
Jan 17 17:25:08 tmkms[1191569]:    8: tendermint_p2p::error::Error::trace_from::ha6080d371e5d66bf
Jan 17 17:25:08 tmkms[1191569]:       at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/flex-error-0.4.4/src/macros.rs:547
Jan 17 17:25:08 tmkms[1191569]:    9: tendermint_p2p::error::Error::io::h426e6ce68aac4f61
Jan 17 17:25:08 tmkms[1191569]:       at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/flex-error-0.4.4/src/macros.rs:863
Jan 17 17:25:08 tmkms[1191569]:   10: <tendermint_p2p::error::Error as core::convert::From<std::io::error::Error>>::from::hbbb4244039ac05b8
Jan 17 17:25:08 tmkms[1191569]:       at /root/.cargo/git/checkouts/tendermint-rs-8ec10096483ea4be/823e79c/p2p/src/error.rs:67
Jan 17 17:25:08 tmkms[1191569]:   11: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual::h3f5e03ebe27c2693
Jan 17 17:25:08 tmkms[1191569]:       at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/result.rs:1962
Jan 17 17:25:08 tmkms[1191569]:   12: tendermint_p2p::secret_connection::share_auth_signature::h6159957cf21c45a1
Jan 17 17:25:08 tmkms[1191569]:       at /root/.cargo/git/checkouts/tendermint-rs-8ec10096483ea4be/823e79c/p2p/src/secret_connection.rs:483
Jan 17 17:25:08 tmkms[1191569]:   13: tendermint_p2p::secret_connection::SecretConnection<IoHandler>::new::h7573f282a1fcd921
Jan 17 17:25:08 tmkms[1191569]:       at /root/.cargo/git/checkouts/tendermint-rs-8ec10096483ea4be/823e79c/p2p/src/secret_connection.rs:315
Jan 17 17:25:08 tmkms[1191569]:   14: tmkms::connection::tcp::open_secret_connection::h9c5cb53469e7f11a
Jan 17 17:25:08 tmkms[1191569]:       at /checkout/src/connection/tcp.rs:45
Jan 17 17:25:08 tmkms[1191569]:   15: tmkms::session::Session::open::ha505adff198f368f
Jan 17 17:25:08 tmkms[1191569]:       at /checkout/src/session.rs:40
Jan 17 17:25:08 tmkms[1191569]:   16: tmkms::client::run_client::{{closure}}::hcd7575440f34a02d
Jan 17 17:25:08 tmkms[1191569]:       at /checkout/src/client.rs:104
Jan 17 17:25:08 tmkms[1191569]:   17: std::panicking::try::do_call::heeb3168f3784dd06
Jan 17 17:25:08 tmkms[1191569]:       at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:502
Jan 17 17:25:08 tmkms[1191569]:   18: __rust_try<unknown>
Jan 17 17:25:08 tmkms[1191569]:       at <unknown source file>:<unknown line>
Jan 17 17:25:08 tmkms[1191569]:   19: std::panicking::try::hc37a7f7faff34a19
Jan 17 17:25:08 tmkms[1191569]:       at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:466
Jan 17 17:25:08 tmkms[1191569]:   20: std::panic::catch_unwind::h4db7d0408a4313cc
Jan 17 17:25:08 tmkms[1191569]:       at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panic.rs:142
Jan 17 17:25:08 tmkms[1191569]:   21: tmkms::client::run_client::ha91ae47579a3f2a2
Jan 17 17:25:08 tmkms[1191569]:       at /checkout/src/client.rs:104
Jan 17 17:25:08 tmkms[1191569]:   22: tmkms::client::main_loop::h13c002e8a7fbb6e0
Jan 17 17:25:08 tmkms[1191569]:       at /checkout/src/client.rs:68
Jan 17 17:25:08 tmkms[1191569]:   23: tmkms::client::Client::spawn::{{closure}}::h1859ca7e2f68f87b
Jan 17 17:25:08 tmkms[1191569]:       at /checkout/src/client.rs:46
Jan 17 17:25:08 tmkms[1191569]:                                 ⋮ 14 frames hidden ⋮

On the node's side

Error: error creating node: error="error with private validator socket client: can't get pubkey: send: endpoint connection timed out" closerError=""

To overwrite the tmkms deps, I used

[patch.crates-io]
tendermint = { git = "https://github.com/qezz/tendermint-rs", tag = "v0.34.0-tcp-conn-patch.4", package = "tendermint", features = ["secp256k1"] }
tendermint-config = { git = "https://github.com/qezz/tendermint-rs", tag = "v0.34.0-tcp-conn-patch.4", package = "tendermint-config" }
tendermint-p2p = { git = "https://github.com/qezz/tendermint-rs", tag = "v0.34.0-tcp-conn-patch.4", package = "tendermint-p2p" }
tendermint-proto = { git = "https://github.com/qezz/tendermint-rs", tag = "v0.34.0-tcp-conn-patch.4", package = "tendermint-proto" }

tried these values

pub const DATA_MAX_SIZE: usize = 10240;
pub const DATA_MAX_SIZE: usize = 1024000;
pub const DATA_MAX_SIZE: usize = 65536;

I didn't have a chance to look deeper, unfortunately

@tony-iqlusion
Copy link
Collaborator

This seems to be some sort of bug in the tendermint-p2p crate.

It's certainly not helped by the API: where Secret Connection is a message-oriented protocol, tendermint-p2p uses io::{Read, Write} for its API, which are intended for a stream-oriented protocol like TCP/TLS.

@mzabaluev
Copy link
Contributor

mzabaluev commented Feb 16, 2024

Let's try with #1393 and re-consider if that does not help. This proposal looks like suppressing the symptom to me.

@mzabaluev mzabaluev reopened this Feb 16, 2024
@mzabaluev mzabaluev closed this as not planned Won't fix, can't repro, duplicate, stale Feb 16, 2024
@qezz
Copy link

qezz commented Feb 16, 2024

shall we try the new version on a testnet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants