Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert MSM and FFT code and their benchmarks. #86

Merged

Conversation

einar-taiko
Copy link
Contributor

@einar-taiko einar-taiko commented Sep 8, 2023

This PR moves MSM and FFT to halo2curves as suggested in #84.

Adds benchmarks for FFT and MSM to serve as comparison standard in the future: current situation vs privacy-scaling-explorations/halo2#40 vs #29 vs post #163.

Note: This might require moving privacy-scaling-explorations/halo2#202 to halo2curves cc @jonathanpwang

Resolves taikoxyz/zkevm-circuits#150.

benches/msm.rs Outdated Show resolved Hide resolved
@einar-taiko einar-taiko marked this pull request as ready for review September 8, 2023 13:18
@mratsim
Copy link
Contributor

mratsim commented Sep 8, 2023

Hi team,

Rationale for this PR is mentioned in privacy-scaling-explorations/halo2#84.

Following this I'd like to:

  • Change best_multiexp name to best_msm
  • Standardize an API for MSM / NTT so that accelerator providers can easily be integrated
    • or alternative backends for easy comparison or differential fuzzing
  • implement the rest of Towards state-of-the-art multi-scalar-muls #163
    • Note: we're hitting the Rust orphan rule when adding the "Jacobian Extended" coordinate system. Like bn and bls12-381 curves, it might be worthwhile to consider integrating pasta instead of depending on upstream (see also discussion around upstream serialization default: fix: Improve serialization for prime fields #85 (comment) )

Depending on what is merged first either this PR or privacy-scaling-explorations/halo2#202 will need to be updated

@CPerezz CPerezz self-requested a review September 11, 2023 11:00
@kilic kilic self-requested a review September 12, 2023 07:58
Copy link
Member

@CPerezz CPerezz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I'd like to run benchmars with and without this commit in our server cc: @AronisAt79 @ed255 So that we can also see if we really benefit from multithreading without exploding RAM comsumption.

Once we wee no RAM explosions, I'mm happy to merge this!
If you have any benchmark results in other machines or memory profiling @mratsim it would be nice if you could share it.

I'll try to run this locally and get it. But only have 16 threads avaliable.

Copy link
Contributor

@han0110 han0110 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Checked locally the fft.rs and msm.rs are ported from halo2_proofs with only 2 lines of non-logic change.

Also got a question about MSM benchmark.

benches/msm.rs Outdated Show resolved Hide resolved
…ltiexp`.

Split into `singlecore` and `multicore` benchmarks so Criterion's result
caching and comparison over multiple runs makes sense.

Rewrite point and scalar generation.
@mratsim
Copy link
Contributor

mratsim commented Sep 15, 2023

The EC points for benchmark should be moved so it's not repeated and it's parallelized, running the 2^18 to 2^22 part of the benches take almost 30min.

Looking at the code for Point::random, 2 things can be slow when creating a point randomly, sqrt and clear_cofactor

if let Some(y) = Option::<$base>::from(y2.sqrt()) {
let sign = y.to_bytes()[0] & 1;
let y = if ysign ^ sign == 0 { y } else { -y };
let p = $name_affine {
x,
y,
};
use $crate::group::cofactor::CofactorGroup;
let p = p.to_curve();
return p.clear_cofactor().to_affine()

But BN254 has a cofactor of 1, so only sqrt is slow. On my optimized library (using addition chains not Tonelli Shanks) it takes 6877ns.
image

Creating 2^22 points in that case would take almost 30s
image

@CPerezz
Copy link
Member

CPerezz commented Sep 16, 2023

We could file an issue for that!
I've been wanting to add addition chains for some time. Maybe using https://github.com/kwantam/addchain from Riad S. Whaby.

@mratsim
Copy link
Contributor

mratsim commented Sep 17, 2023

The bench came from my library, with addition chains ;), so likely in Halo2curves it might even take a minute to generate the points.

Copy link
Contributor

@han0110 han0110 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@CPerezz CPerezz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@mratsim
Copy link
Contributor

mratsim commented Sep 18, 2023

Hold on on merging, @einar-taiko is looking into parallelizing and caching the generation of test (coeffs, points) pairs

@CPerezz
Copy link
Member

CPerezz commented Sep 18, 2023

Hold on on merging, @einar-taiko is looking into parallelizing and caching the generation of test (coeffs, points) pairs

Didn't do it as per your last comment :)

Laptop measurements:
k=22: 109 sec
k=16:   1 sec
Copy link
Contributor

@mratsim mratsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new bench with parallel point generation now completes under a minute

Copy link
Contributor

@han0110 han0110 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a small suggestion, could be ignored.

benches/msm.rs Outdated Show resolved Hide resolved
@mratsim
Copy link
Contributor

mratsim commented Sep 19, 2023

Here are my benchmarks on my laptop (i9-11980HK 8 cores). Unfortunately I have to go to zkSummit just after so benchmark on my 18-core workstation will have to wait end of week.

For 2^16 there is a factor 3-3.5x between halo2curves (90ms) and Gnark (30ms) / Constantine (22.5ms)

Halo2curves

image

Gnark

image

Constantine

image

Copy link
Contributor

@han0110 han0110 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need to add required-features = ["multicore"] for msm.rs to pass the CI build without default features.

Cargo.toml Show resolved Hide resolved
@han0110 han0110 added this pull request to the merge queue Sep 22, 2023
Merged via the queue into privacy-scaling-explorations:main with commit ee7cb86 Sep 22, 2023
7 checks passed
jonathanpwang added a commit to axiom-crypto/halo2curves that referenced this pull request Sep 23, 2023
* Add field conversion to/from `[u64;4]` (privacy-scaling-explorations#80)

* feat: add field conversion to/from `[u64;4]`

* Added conversion tests
* Added `montgomery_reduce_short` for no-asm
* For bn256, uses assembly conversion when asm feature is on

* fix: remove conflict for asm

* chore: bump rust-toolchain to 1.67.0

* Compute Legendre symbol for `hash_to_curve` (privacy-scaling-explorations#77)

* Add `Legendre` trait and macro

 - Add Legendre macro with norm and legendre symbol computation
 - Add macro for automatic implementation in prime fields

* Add legendre macro call for prime fields

* Remove unused imports

* Remove leftover

* Add `is_quadratic_non_residue` for hash_to_curve

* Add `legendre` function

* Compute modulus separately

* Substitute division for shift

* Update modulus computation

* Add quadratic residue check func

* Add quadratic residue tests

* Add hash_to_curve bench

* Implement Legendre trait for all curves

* Move misplaced comment

* Add all curves to hash bench

* fix: add suggestion for legendre_exp

* fix: imports after rebase

* Add simplified SWU method (privacy-scaling-explorations#81)

* Fix broken link

* Add simple SWU algorithm

* Add simplified SWU hash_to_curve for secp256r1

* add: sswu z reference

* update MAP_ID identifier

Co-authored-by: Han <[email protected]>

---------

Co-authored-by: Han <[email protected]>

* Bring back curve algorithms for `a = 0` (privacy-scaling-explorations#82)

* refactor: bring back curve algorithms for `a = 0`

* fix: clippy warning

* fix: Improve serialization for prime fields (privacy-scaling-explorations#85)

* fix: Improve serialization for prime fields

Summary: 256-bit field serialization is currently 4x u64, ie. the native format. This implements the standard of byte-serialization (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of
that for (de)serializers that are human-readable (concretely, json).

- Added a new macro `serialize_deserialize_32_byte_primefield!` for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and derive libraries.
- Implemented the new macro for serialization and deserialization in various structs, replacing the previous `serde::{Deserialize, Serialize}` direct use.
- Enhanced error checking in the custom serialization methods to ensure valid field elements.
- Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests for object integrity checking.

* fixup! fix: Improve serialization for prime fields

---------

Co-authored-by: Carlos Pérez <[email protected]>

* refactor: (De)Serialization of points using `GroupEncoding` (privacy-scaling-explorations#88)

* refactor: implement (De)Serialization of points using the `GroupEncoding` trait

- Updated curve point (de)serialization logic from the internal representation to the
  representation offered by the implementation of the `GroupEncoding` trait.

* fix: add explicit json serde tests

* Insert MSM and FFT code and their benchmarks. (privacy-scaling-explorations#86)

* Insert MSM and FFT code and their benchmarks.

Resolves taikoxyz/zkevm-circuits#150.

* feedback

* Add instructions

* feeback

* Implement feedback:  Actually supply the correct arguments to `best_multiexp`.

Split into `singlecore` and `multicore` benchmarks so Criterion's result
caching and comparison over multiple runs makes sense.

Rewrite point and scalar generation.

* Use slicing and parallelism to to decrease running time.

Laptop measurements:
k=22: 109 sec
k=16:   1 sec

* Refactor msm

* Refactor fft

* Update module comments

* Fix formatting

* Implement suggestion for fixing CI

---------

Co-authored-by: David Nevado <[email protected]>
Co-authored-by: Han <[email protected]>
Co-authored-by: François Garillot <[email protected]>
Co-authored-by: Carlos Pérez <[email protected]>
Co-authored-by: einar-taiko <[email protected]>
@mratsim mratsim mentioned this pull request Oct 23, 2023
@huitseeker huitseeker mentioned this pull request Oct 23, 2023
@mratsim mratsim mentioned this pull request Oct 25, 2023
3 tasks
jonathanpwang added a commit to axiom-crypto/halo2curves that referenced this pull request Nov 13, 2023
* Add field conversion to/from `[u64;4]` (privacy-scaling-explorations#80)

* feat: add field conversion to/from `[u64;4]`

* Added conversion tests
* Added `montgomery_reduce_short` for no-asm
* For bn256, uses assembly conversion when asm feature is on

* fix: remove conflict for asm

* chore: bump rust-toolchain to 1.67.0

* Compute Legendre symbol for `hash_to_curve` (privacy-scaling-explorations#77)

* Add `Legendre` trait and macro

 - Add Legendre macro with norm and legendre symbol computation
 - Add macro for automatic implementation in prime fields

* Add legendre macro call for prime fields

* Remove unused imports

* Remove leftover

* Add `is_quadratic_non_residue` for hash_to_curve

* Add `legendre` function

* Compute modulus separately

* Substitute division for shift

* Update modulus computation

* Add quadratic residue check func

* Add quadratic residue tests

* Add hash_to_curve bench

* Implement Legendre trait for all curves

* Move misplaced comment

* Add all curves to hash bench

* fix: add suggestion for legendre_exp

* fix: imports after rebase

* Add simplified SWU method (privacy-scaling-explorations#81)

* Fix broken link

* Add simple SWU algorithm

* Add simplified SWU hash_to_curve for secp256r1

* add: sswu z reference

* update MAP_ID identifier

Co-authored-by: Han <[email protected]>

---------

Co-authored-by: Han <[email protected]>

* Bring back curve algorithms for `a = 0` (privacy-scaling-explorations#82)

* refactor: bring back curve algorithms for `a = 0`

* fix: clippy warning

* fix: Improve serialization for prime fields (privacy-scaling-explorations#85)

* fix: Improve serialization for prime fields

Summary: 256-bit field serialization is currently 4x u64, ie. the native format. This implements the standard of byte-serialization (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of
that for (de)serializers that are human-readable (concretely, json).

- Added a new macro `serialize_deserialize_32_byte_primefield!` for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and derive libraries.
- Implemented the new macro for serialization and deserialization in various structs, replacing the previous `serde::{Deserialize, Serialize}` direct use.
- Enhanced error checking in the custom serialization methods to ensure valid field elements.
- Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests for object integrity checking.

* fixup! fix: Improve serialization for prime fields

---------

Co-authored-by: Carlos Pérez <[email protected]>

* refactor: (De)Serialization of points using `GroupEncoding` (privacy-scaling-explorations#88)

* refactor: implement (De)Serialization of points using the `GroupEncoding` trait

- Updated curve point (de)serialization logic from the internal representation to the
  representation offered by the implementation of the `GroupEncoding` trait.

* fix: add explicit json serde tests

* Insert MSM and FFT code and their benchmarks. (privacy-scaling-explorations#86)

* Insert MSM and FFT code and their benchmarks.

Resolves taikoxyz/zkevm-circuits#150.

* feedback

* Add instructions

* feeback

* Implement feedback:  Actually supply the correct arguments to `best_multiexp`.

Split into `singlecore` and `multicore` benchmarks so Criterion's result
caching and comparison over multiple runs makes sense.

Rewrite point and scalar generation.

* Use slicing and parallelism to to decrease running time.

Laptop measurements:
k=22: 109 sec
k=16:   1 sec

* Refactor msm

* Refactor fft

* Update module comments

* Fix formatting

* Implement suggestion for fixing CI

* Re-export also mod `pairing` and remove flag `reexport` to alwasy re-export (privacy-scaling-explorations#93)

fix: re-export also mod `pairing` and remove flag `reexport` to alwasy re-export

* fix regression in privacy-scaling-explorations#93 reexport field benches aren't run (privacy-scaling-explorations#94)

fix regression in privacy-scaling-explorations#93, field benches aren't run

* Fast modular inverse - 9.4x acceleration (privacy-scaling-explorations#83)

* Bernstein yang modular multiplicative inverter (#2)

* rename similar to privacy-scaling-explorations#95

---------

Co-authored-by: Aleksei Vambol <[email protected]>

* Fast isSquare / Legendre symbol / Jacobi symbol - 16.8x acceleration (privacy-scaling-explorations#95)

* Derivatives of the Pornin's method (taikoxyz#3)

* renaming file

* make cargo fmt happy

* clarifications from privacy-scaling-explorations#95 (comment) [skip ci]

* Formatting and slightly changing a comment

---------

Co-authored-by: Aleksei Vambol <[email protected]>

* chore: delete bernsteinyang module (replaced by ff_inverse)

* Bump version to 0.4.1

---------

Co-authored-by: David Nevado <[email protected]>
Co-authored-by: Han <[email protected]>
Co-authored-by: François Garillot <[email protected]>
Co-authored-by: Carlos Pérez <[email protected]>
Co-authored-by: einar-taiko <[email protected]>
Co-authored-by: Mamy Ratsimbazafy <[email protected]>
Co-authored-by: Aleksei Vambol <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicate MSM and FFT implementations and benchmarks from halo2 into halo2curves
5 participants