Add extra API for offline/online phases in ECDSA signing #1610

RandomLattice · 2024-09-30T15:15:06Z

We'd love to split the ECDSA signing operation into two steps:

an "offline" step that is independent on the message,
an "online" step that depends on the message.

The advantage here is that the offline step can be precomputed arbitrarily early and do "most of the computational work". This is very useful for signers that have a lot of idle time before the signing request comes in, but little time to compute and return the signature. For example, hardware wallets.

In ECDSA, the offline step can essentially compute nonce generation + scalar multiplication + nonce modular inversion. These values ("precomputed material") are input to the online part. The online part just finishes the ECDSA computation. The online part is extremely fast (we're seeing around 5000x faster than the offline phase in a non-libsecp256k1 proof-of-concept).

The precomputed material is secret, can only be used once and needs to be wiped after usage.

API-wise this would probably mean adding two functions:

secp256k1_ecdsa_sign_split_phase_precompute()
secp256k1_ecdsa_sign_split_phase_online()

The composition of the two functions compute secp256k1_ecdsa_sign(). The input/output behavior does not change.

Is there any appetite for this? Happy to write the code for this.

The text was updated successfully, but these errors were encountered:

real-or-random · 2024-10-03T15:48:25Z

The precomputed material is secret, can only be used once and needs to be wiped after usage.

Indeed, and this is the drawback of this. It's a major footgun, and I'm sure this is the reason why it's currently implemented this way.

Personally, I'm a bit on the fence. We're about to add a MuSig2 module, which is also prone to nonce-reuse failures -- but in that case, it is unavoidable to split nonce generation and the actual signing into two separate steps. In the end, I don't think I'm against giving users advanced and dangerous features, but they should be clearly marked as such.

I'd love to hear your use case and scenario here. Noone has asked for this before, and I doubt many users will benefit. (And anyway, ECDSA can be considered somewhat deprecated in Bitcoin, but okay, the same could be done for Schnorr signing). Do you personally need/want this, or is this is just a suggestion for improvement?

It would be interesting to know what other contributors and maintainers think.

RandomLattice · 2024-10-04T08:34:01Z

this is the drawback of this. It's a major footgun

I think we have ways to make this less of a footgun. For example, here goes a simple mitigation. We can add to the precomputed material data struct a flag has_been_used meant to detect reuse. The consumer function of the struct uses this flag as follows:

checks on function enter that has_been_used == NOT_USED_YET. Refuses to continue if this doesn't check.
sets has_been_used to ALREADY_USED
proceeds to perform the actual computation (possibly following sad paths)

Like this, a user has to go out of their way to use this unsafely (by modifying fields in an opaque struct). This doesn't cover the contrived case where the user is reusing precomputed material that got persisted before the flag change; but I think that's a narrow corner case that can be dealt with a clear warning: do not reuse nonce material if you're using this API.

I imagine you've considered similar things for MuSig2, curious what you think.

I don't think I'm against giving users advanced and dangerous features, but they should be clearly marked as such

I think this is the right tradeoff. We can clearly mark the precomputed material as secret and that it cannot be re-used.

I'd love to hear your use case and scenario here. [...] Do you personally need/want this, or is this is just a suggestion for improvement?

I'm asking others to chime in and give better context. Long story short is that we're hitting a performance bottleneck in Bitkey hardware wallet (https://github.com/proto-at-block/bitkey). We need to speed up (by a lot) the signing operation for large transactions if we want to use libsecp256k1.

real-or-random · 2024-10-04T11:48:43Z

this is the drawback of this. It's a major footgun

I think we have ways to make this less of a footgun. For example, here goes a simple mitigation. We can add to the precomputed material data struct a flag has_been_used meant to detect reuse. [...]

Like this, a user has to go out of their way to use this unsafely (by modifying fields in an opaque struct).

This doesn't cover the contrived case where the user is reusing precomputed material that got persisted before the flag change; but I think that's a narrow corner case that can be dealt with a clear warning: do not reuse nonce material if you're using this API.

I don't think it's contrived. What if you run inside a VM and the VM is reset? What if the caller decides to memcpy the nonce, or write it to disk? (We can document that one must not do this, but this doesn't necessarily mean that people will adhere to the rules.)

The second large concern is that one cannot use deterministic nonce generation when the message is not known. So you'll need a good source of randomness during signing.

But yeah, we're apparently willing to accept that risk in the MuSig2 API, so it's not entirely crazy to add a similar API also for ordinary signatures.

I imagine you've considered similar things for MuSig2, curious what you think.

In the MuSig2 PR, we simply overwrite the nonce with zeros after using it. That's a tiny bit safer and doesn't need a flag.

Long story short is that we're hitting a performance bottleneck in Bitkey hardware wallet (proto-at-block/bitkey). We need to speed up (by a lot) the signing operation for large transactions if we want to use libsecp256k1.

I'm curious why you use ECDSA at all in a modern wallet. Schnorr signing is 27% faster on my machine, I'd be curious about the difference on the hardware wallet. The difference is probably mostly due to the RFC6979 hashing in ECDSA, which is overkill. So switching to a faster nonce derivation function should already speed up things a lot without introducing risks.

Also, have you tried with a recent version (>=0.5.0), and have you tried all precomputation table sizes on your platform? (See https://github.com/bitcoin-core/secp256k1/blob/master/CHANGELOG.md#changed-1, note that we changed the default in 0.5.1)

RandomLattice · 2024-10-04T13:12:22Z

Thanks for the great discussion.

I don't think it's contrived. What if you run inside a VM and the VM is reset? What if the caller decides to memcpy the nonce, or write it to disk? (We can document that one must not do this, but this doesn't necessarily mean that people will adhere to the rules.)

I agree rollback is bad. I think in the VM case, one would not use the new proposed API. One would just use the existing API. The new proposed API is suitable for a specific use case we're hitting "in real life": an embedded device (=constrained computing device) that a) needs to sign "very fast" and b) has plenty of time for precomputation before the signing request comes in.

The second large concern is that one cannot use deterministic nonce generation when the message is not known.

We cannot use RFC6979 to generate the secret nonce when the message is not known. But that doesn't mean we can use other approaches for deterministic nonce generation (see next paragraph).

So you'll need a good source of randomness during signing.

Yes, that is one way, but not the only one. Alternatively, one can use a monotonic counter / unique (public) value as input to a keyed hash to construct the secret nonce. I'm not necessarily advocating for this; it all depends on the specific use case and definitely we're in the territory of "you need to know what you're doing". I think the new API should be generic enough to allow the user to do this kind of choices.

But yeah, we're apparently willing to accept that risk in the MuSig2 API, so it's not entirely crazy to add a similar API also for ordinary signatures.

Sounds reasonable, thanks.

In the MuSig2 PR, we simply overwrite the nonce with zeros after using it. That's a tiny bit safer and doesn't need a flag.

Sure, that would also work. Happy to take this approach too.

have you tried with a recent version (>=0.5.0), and have you tried all precomputation table sizes on your platform?

I'll let others chime in, I don't have the numbers right now with me. Thanks for all the suggestions!

jmecom · 2024-10-04T16:54:59Z

Hopping in here -- this would be useful for Bitkey's firmware, which does transaction signing over NFC using libsecp256k1. The chip we're using is an EFR32MG24, so it's not particularly fast, and we've recently had some performance issues when users try to sign many inputs back-to-back in one NFC session on their phone.

We're considering using our chip's hardware accelerated ECDSA, but we prefer libsecp256k1 for various reasons; and the hardware doesn't support Schnorr signatures naturally, so we're a bit stuck there without a different optimization.

I'm curious why you use ECDSA at all in a modern wallet. Schnorr signing is 27% faster on my machine, I'd be curious about the difference on the hardware wallet.

Not all exchanges support sending to bech32m, so it's a bit difficult to eschew ECDSA completely. On our chip, ECDSA takes ~60ms and Schnorr takes ~55ms. We can't use large precomputation tables due to limited RAM.

I do think the optimization being proposed here is mostly useful for resource constrained embedded systems, but it could be very useful in those settings.

apoelstra · 2024-10-04T17:20:11Z

I think it's reasonable to expose such an API for both ECDSA and Schnorr. I would suggest doing it in much the same way that we expose custom nonce functions -- provide a second API that takes a callback and auxiliary data, whose role it is to produce a nonce keypair and inverted nonce.

This would be symmetric with our existing ability to override the RFC6979, which is similarly niche and dangerous. (Though the "typical" use of overriding the nonce would be to use a faster PRNG, which is perfectly safe, while the "typical" use of overriding the nonce computation would be to use precomputed nonce values, which is fraught.)

I would not suggest exposing an easy-to-use or first-class API for this. It's much more difficult to obtain a monotonic counter (or strong RNG) and to ensure your software is never forked or reset, than it is to use an awkward C API. So if we were to provide a friendly API, we'd just be making the "easy" part easier and potentially encourage people to use it without carefully considering whether they can do so safely.

sipa · 2024-10-04T17:49:01Z

@jmecom Sorry for the slight tangent here, but:

On our chip, ECDSA takes ~60ms and Schnorr takes ~55ms. We can't use large precomputation tables due to limited RAM.

Since #1058 landed in 0.5.0, the supported table sizes (for the key generation/signing code) are 2 KiB, 22 KiB, and 86 KiB. How much is acceptable to you? It's easy to add more configurations if there is a need for them.

jmecom · 2024-10-04T20:29:15Z

@jmecom Sorry for the slight tangent here, but:

On our chip, ECDSA takes ~60ms and Schnorr takes ~55ms. We can't use large precomputation tables due to limited RAM.

Since #1058 landed in 0.5.0, the supported table sizes (for the key generation/signing code) are 2 KiB, 22 KiB, and 86 KiB. How much is acceptable to you? It's easy to add more configurations if there is a need for them.

I just pulled in 0.5.1 and experimented with this. I didn't see a huge impact on the table size with a window size of two:

    62 ms @ 2kb
    58 ms @ 22kb
	56 ms @ 86kb

Window size of five took about 50ms @ 22kb. So, I don't think more fine-grained options are really necessary for our usecase at least.

sipa · 2024-10-04T20:33:11Z

@jmecom By "window size", you mean the ECMULT_WINDOW_SIZE setting? That should have no impact on key generation/signing speed, only on verification speed.

jmecom · 2024-10-04T22:21:03Z

@jmecom By "window size", you mean the ECMULT_WINDOW_SIZE setting? That should have no impact on key generation/signing speed, only on verification speed.

Ah, woops. Due to the way I'm profiling, there is a little variation across each run, so I mistakenly thought that lowered the duration, but it was just an anomaly. Ignore that, but the rest is accurate.

RandomLattice · 2024-10-07T10:40:44Z

I think it's reasonable to expose such an API for both ECDSA and Schnorr. I would suggest doing it in much the same way that we expose custom nonce functions -- provide a second API that takes a callback and auxiliary data, whose role it is to produce a nonce keypair and inverted nonce.

Here goes an sketch of the proposed changes. Let me know what you think:

The new functionality can live in a separate module module_precomputationapi. This module can be enabled at compile time and is disabled by default.
Introduce an opaque data structure that holds the "precomputed material" in include/secp256k1_precomputedapi.h:

    typedef struct secp256k1_precomputed_struct secp256k1_precomputed

Define the struct holding nonce keypair + inverted nonce in src/modules/precomputedapi/main_impl.h. This data struct is not guaranteed to be portable across platforms or versions.
Introduce these new API functions in src/modules/precomputedapi/main_impl.h:

int secp256k1_ecdsa_precomputatedapi_prepare(
    const secp256k1_context *ctx,
    secp256k1_precomputed *precomp,
    const unsigned char *seckey,
    secp256k1_nonce_function noncefp,
    const void *ndata

int secp256k1_ecdsa_precomputatedapi_sign(
    const secp256k1_context *ctx,
    secp256k1_ecdsa_signature *sig,
    const unsigned char *msghash32,
    const unsigned char *seckey,
    secp256k1_precomputed *precomp,

Testing:

test_precomputation_api: test all NULLs are detected.
test_precomputation_identical: test side-by-side against secp256k1_ecdsa_sign.
test_precomputation_roundtrip: test all signatures generated with secp256k1_ecdsa_precomputatedapi_sign pass. This test may be redundant given test_precomputation_identical.

Benchmark: would be useful to see what is the speed up attained, but lower priority than point 5.

Some points for discussion:

The term "precomputationapi" isn't great for describing this. secp256k1 already uses "precomputation" in a different context. Is there a better term? The alternatives I have thought of aren't any better: "precomputednonce", "offlineonline", "deattachednonce".
We can take first the ECDSA path, and extend in the future for Schnorr if there's a need.

I would not suggest exposing an easy-to-use or first-class API for this. It's much more difficult to obtain a monotonic counter (or strong RNG) and to ensure your software is never forked or reset, than it is to use an awkward C API. So if we were to provide a friendly API, we'd just be making the "easy" part easier and potentially encourage people to use it without carefully considering whether they can do so safely.

I hope the API above follows this advice.

apoelstra · 2024-10-07T15:22:33Z

The above proposal sounds reasonable to me (other than the name -- I agree "precomputednonce" or "detachednonce" sound better).

But it does open a question about whether we would want an "off by default" module. Currently we have two categories of modules:

"experimental" modules where we do not guarantee API stability, but hope to eventually get there
non-experimental modules, which are all turned on for releases, since we don't have a good semver-compatible way for distributions to pick and choose which modules they compile in

Originally I think we had imagined that users would be able to pick and choose what modules they need, but that's not really compatible with shared library distribution; and we also hoped that users would be able to write their own custom modules, but that doesn't really work with C tooling (though I suppose it could be made to work with nix stdenv and the patches field maybe..).

Anyway, this module would presumably be "experimental", but unlike the other experimental modules, I don't think we would ever be willing to promote it to a "real" module.

RandomLattice · 2024-10-07T15:26:44Z

Thanks! I like "precomputednonce" better.

this module would presumably be "experimental", but unlike the other experimental modules, I don't think we would ever be willing to promote it to a "real" module

That sounds reasonable. Sounds like it would be better to mark this module as experimental.

RandomLattice · 2024-10-09T14:37:05Z

@real-or-random @sipa : I'd love to get some feedback before writing code. Do you think the proposal in #1610 (comment) is reasonable? Should we discuss a specific point? Would you be OK if I put up a PR adding this API?

From #1610 (comment) it looks like this proposal is reasonable to @apoelstra.

real-or-random added feature performance labels Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add extra API for offline/online phases in ECDSA signing #1610

Add extra API for offline/online phases in ECDSA signing #1610

RandomLattice commented Sep 30, 2024

real-or-random commented Oct 3, 2024

RandomLattice commented Oct 4, 2024

real-or-random commented Oct 4, 2024

RandomLattice commented Oct 4, 2024

jmecom commented Oct 4, 2024

apoelstra commented Oct 4, 2024

sipa commented Oct 4, 2024 •

edited

Loading

jmecom commented Oct 4, 2024

sipa commented Oct 4, 2024

jmecom commented Oct 4, 2024

RandomLattice commented Oct 7, 2024

apoelstra commented Oct 7, 2024

RandomLattice commented Oct 7, 2024

RandomLattice commented Oct 9, 2024

Add extra API for offline/online phases in ECDSA signing #1610

Add extra API for offline/online phases in ECDSA signing #1610

Comments

RandomLattice commented Sep 30, 2024

real-or-random commented Oct 3, 2024

RandomLattice commented Oct 4, 2024

real-or-random commented Oct 4, 2024

RandomLattice commented Oct 4, 2024

jmecom commented Oct 4, 2024

apoelstra commented Oct 4, 2024

sipa commented Oct 4, 2024 • edited Loading

jmecom commented Oct 4, 2024

sipa commented Oct 4, 2024

jmecom commented Oct 4, 2024

RandomLattice commented Oct 7, 2024

apoelstra commented Oct 7, 2024

RandomLattice commented Oct 7, 2024

RandomLattice commented Oct 9, 2024

sipa commented Oct 4, 2024 •

edited

Loading