Insert MSM and FFT code and their benchmarks. #86

einar-taiko · 2023-09-08T05:24:09Z

This PR moves MSM and FFT to halo2curves as suggested in #84.

Adds benchmarks for FFT and MSM to serve as comparison standard in the future: current situation vs privacy-scaling-explorations/halo2#40 vs #29 vs post #163.

Note: This might require moving privacy-scaling-explorations/halo2#202 to halo2curves cc @jonathanpwang

Resolves taikoxyz/zkevm-circuits#150.

benches/msm.rs

mratsim · 2023-09-08T13:45:54Z

Hi team,

Rationale for this PR is mentioned in privacy-scaling-explorations/halo2#84.

Following this I'd like to:

Change best_multiexp name to best_msm
Standardize an API for MSM / NTT so that accelerator providers can easily be integrated
- or alternative backends for easy comparison or differential fuzzing
implement the rest of Towards state-of-the-art multi-scalar-muls #163
- Note: we're hitting the Rust orphan rule when adding the "Jacobian Extended" coordinate system. Like bn and bls12-381 curves, it might be worthwhile to consider integrating pasta instead of depending on upstream (see also discussion around upstream serialization default: fix: Improve serialization for prime fields #85 (comment) )

Depending on what is merged first either this PR or privacy-scaling-explorations/halo2#202 will need to be updated

CPerezz

This looks good! I'd like to run benchmars with and without this commit in our server cc: @AronisAt79 @ed255 So that we can also see if we really benefit from multithreading without exploding RAM comsumption.

Once we wee no RAM explosions, I'mm happy to merge this!
If you have any benchmark results in other machines or memory profiling @mratsim it would be nice if you could share it.

I'll try to run this locally and get it. But only have 16 threads avaliable.

han0110

LGTM! Checked locally the fft.rs and msm.rs are ported from halo2_proofs with only 2 lines of non-logic change.

Also got a question about MSM benchmark.

benches/msm.rs

…ltiexp`. Split into `singlecore` and `multicore` benchmarks so Criterion's result caching and comparison over multiple runs makes sense. Rewrite point and scalar generation.

mratsim · 2023-09-15T15:45:56Z

The EC points for benchmark should be moved so it's not repeated and it's parallelized, running the 2^18 to 2^22 part of the benches take almost 30min.

Looking at the code for Point::random, 2 things can be slow when creating a point randomly, sqrt and clear_cofactor

halo2curves/src/derive/curve.rs

Lines 365 to 377 in 6e2ff38

    
           if let Some(y) = Option::<$base>::from(y2.sqrt()) { 
        
               let sign = y.to_bytes()[0] & 1; 
        
               let y = if ysign ^ sign == 0 { y } else { -y }; 
        
               let p = $name_affine { 
        
                   x, 
        
                   y, 
        
               }; 
        
               use $crate::group::cofactor::CofactorGroup; 
        
               let p = p.to_curve(); 
        
               return p.clear_cofactor().to_affine()

But BN254 has a cofactor of 1, so only sqrt is slow. On my optimized library (using addition chains not Tonelli Shanks) it takes 6877ns.

Creating 2^22 points in that case would take almost 30s

CPerezz · 2023-09-16T07:43:24Z

We could file an issue for that!
I've been wanting to add addition chains for some time. Maybe using https://github.com/kwantam/addchain from Riad S. Whaby.

mratsim · 2023-09-17T08:05:02Z

The bench came from my library, with addition chains ;), so likely in Halo2curves it might even take a minute to generate the points.

han0110

LGTM!

CPerezz

Nice work!

mratsim · 2023-09-18T12:13:10Z

Hold on on merging, @einar-taiko is looking into parallelizing and caching the generation of test (coeffs, points) pairs

CPerezz · 2023-09-18T12:29:33Z

Hold on on merging, @einar-taiko is looking into parallelizing and caching the generation of test (coeffs, points) pairs

Didn't do it as per your last comment :)

Laptop measurements: k=22: 109 sec k=16: 1 sec

mratsim

The new bench with parallel point generation now completes under a minute

han0110

LGTM! Just a small suggestion, could be ignored.

benches/msm.rs

mratsim · 2023-09-19T12:11:14Z

Here are my benchmarks on my laptop (i9-11980HK 8 cores). Unfortunately I have to go to zkSummit just after so benchmark on my 18-core workstation will have to wait end of week.

For 2^16 there is a factor 3-3.5x between halo2curves (90ms) and Gnark (30ms) / Constantine (22.5ms)

Halo2curves

Gnark

Constantine

han0110

I think we still need to add required-features = ["multicore"] for msm.rs to pass the CI build without default features.

Cargo.toml

* Add field conversion to/from `[u64;4]` (privacy-scaling-explorations#80) * feat: add field conversion to/from `[u64;4]` * Added conversion tests * Added `montgomery_reduce_short` for no-asm * For bn256, uses assembly conversion when asm feature is on * fix: remove conflict for asm * chore: bump rust-toolchain to 1.67.0 * Compute Legendre symbol for `hash_to_curve` (privacy-scaling-explorations#77) * Add `Legendre` trait and macro - Add Legendre macro with norm and legendre symbol computation - Add macro for automatic implementation in prime fields * Add legendre macro call for prime fields * Remove unused imports * Remove leftover * Add `is_quadratic_non_residue` for hash_to_curve * Add `legendre` function * Compute modulus separately * Substitute division for shift * Update modulus computation * Add quadratic residue check func * Add quadratic residue tests * Add hash_to_curve bench * Implement Legendre trait for all curves * Move misplaced comment * Add all curves to hash bench * fix: add suggestion for legendre_exp * fix: imports after rebase * Add simplified SWU method (privacy-scaling-explorations#81) * Fix broken link * Add simple SWU algorithm * Add simplified SWU hash_to_curve for secp256r1 * add: sswu z reference * update MAP_ID identifier Co-authored-by: Han <[email protected]> --------- Co-authored-by: Han <[email protected]> * Bring back curve algorithms for `a = 0` (privacy-scaling-explorations#82) * refactor: bring back curve algorithms for `a = 0` * fix: clippy warning * fix: Improve serialization for prime fields (privacy-scaling-explorations#85) * fix: Improve serialization for prime fields Summary: 256-bit field serialization is currently 4x u64, ie. the native format. This implements the standard of byte-serialization (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of that for (de)serializers that are human-readable (concretely, json). - Added a new macro `serialize_deserialize_32_byte_primefield!` for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and derive libraries. - Implemented the new macro for serialization and deserialization in various structs, replacing the previous `serde::{Deserialize, Serialize}` direct use. - Enhanced error checking in the custom serialization methods to ensure valid field elements. - Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests for object integrity checking. * fixup! fix: Improve serialization for prime fields --------- Co-authored-by: Carlos Pérez <[email protected]> * refactor: (De)Serialization of points using `GroupEncoding` (privacy-scaling-explorations#88) * refactor: implement (De)Serialization of points using the `GroupEncoding` trait - Updated curve point (de)serialization logic from the internal representation to the representation offered by the implementation of the `GroupEncoding` trait. * fix: add explicit json serde tests * Insert MSM and FFT code and their benchmarks. (privacy-scaling-explorations#86) * Insert MSM and FFT code and their benchmarks. Resolves taikoxyz/zkevm-circuits#150. * feedback * Add instructions * feeback * Implement feedback: Actually supply the correct arguments to `best_multiexp`. Split into `singlecore` and `multicore` benchmarks so Criterion's result caching and comparison over multiple runs makes sense. Rewrite point and scalar generation. * Use slicing and parallelism to to decrease running time. Laptop measurements: k=22: 109 sec k=16: 1 sec * Refactor msm * Refactor fft * Update module comments * Fix formatting * Implement suggestion for fixing CI --------- Co-authored-by: David Nevado <[email protected]> Co-authored-by: Han <[email protected]> Co-authored-by: François Garillot <[email protected]> Co-authored-by: Carlos Pérez <[email protected]> Co-authored-by: einar-taiko <[email protected]>

* Add field conversion to/from `[u64;4]` (privacy-scaling-explorations#80) * feat: add field conversion to/from `[u64;4]` * Added conversion tests * Added `montgomery_reduce_short` for no-asm * For bn256, uses assembly conversion when asm feature is on * fix: remove conflict for asm * chore: bump rust-toolchain to 1.67.0 * Compute Legendre symbol for `hash_to_curve` (privacy-scaling-explorations#77) * Add `Legendre` trait and macro - Add Legendre macro with norm and legendre symbol computation - Add macro for automatic implementation in prime fields * Add legendre macro call for prime fields * Remove unused imports * Remove leftover * Add `is_quadratic_non_residue` for hash_to_curve * Add `legendre` function * Compute modulus separately * Substitute division for shift * Update modulus computation * Add quadratic residue check func * Add quadratic residue tests * Add hash_to_curve bench * Implement Legendre trait for all curves * Move misplaced comment * Add all curves to hash bench * fix: add suggestion for legendre_exp * fix: imports after rebase * Add simplified SWU method (privacy-scaling-explorations#81) * Fix broken link * Add simple SWU algorithm * Add simplified SWU hash_to_curve for secp256r1 * add: sswu z reference * update MAP_ID identifier Co-authored-by: Han <[email protected]> --------- Co-authored-by: Han <[email protected]> * Bring back curve algorithms for `a = 0` (privacy-scaling-explorations#82) * refactor: bring back curve algorithms for `a = 0` * fix: clippy warning * fix: Improve serialization for prime fields (privacy-scaling-explorations#85) * fix: Improve serialization for prime fields Summary: 256-bit field serialization is currently 4x u64, ie. the native format. This implements the standard of byte-serialization (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of that for (de)serializers that are human-readable (concretely, json). - Added a new macro `serialize_deserialize_32_byte_primefield!` for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and derive libraries. - Implemented the new macro for serialization and deserialization in various structs, replacing the previous `serde::{Deserialize, Serialize}` direct use. - Enhanced error checking in the custom serialization methods to ensure valid field elements. - Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests for object integrity checking. * fixup! fix: Improve serialization for prime fields --------- Co-authored-by: Carlos Pérez <[email protected]> * refactor: (De)Serialization of points using `GroupEncoding` (privacy-scaling-explorations#88) * refactor: implement (De)Serialization of points using the `GroupEncoding` trait - Updated curve point (de)serialization logic from the internal representation to the representation offered by the implementation of the `GroupEncoding` trait. * fix: add explicit json serde tests * Insert MSM and FFT code and their benchmarks. (privacy-scaling-explorations#86) * Insert MSM and FFT code and their benchmarks. Resolves taikoxyz/zkevm-circuits#150. * feedback * Add instructions * feeback * Implement feedback: Actually supply the correct arguments to `best_multiexp`. Split into `singlecore` and `multicore` benchmarks so Criterion's result caching and comparison over multiple runs makes sense. Rewrite point and scalar generation. * Use slicing and parallelism to to decrease running time. Laptop measurements: k=22: 109 sec k=16: 1 sec * Refactor msm * Refactor fft * Update module comments * Fix formatting * Implement suggestion for fixing CI * Re-export also mod `pairing` and remove flag `reexport` to alwasy re-export (privacy-scaling-explorations#93) fix: re-export also mod `pairing` and remove flag `reexport` to alwasy re-export * fix regression in privacy-scaling-explorations#93 reexport field benches aren't run (privacy-scaling-explorations#94) fix regression in privacy-scaling-explorations#93, field benches aren't run * Fast modular inverse - 9.4x acceleration (privacy-scaling-explorations#83) * Bernstein yang modular multiplicative inverter (#2) * rename similar to privacy-scaling-explorations#95 --------- Co-authored-by: Aleksei Vambol <[email protected]> * Fast isSquare / Legendre symbol / Jacobi symbol - 16.8x acceleration (privacy-scaling-explorations#95) * Derivatives of the Pornin's method (taikoxyz#3) * renaming file * make cargo fmt happy * clarifications from privacy-scaling-explorations#95 (comment) [skip ci] * Formatting and slightly changing a comment --------- Co-authored-by: Aleksei Vambol <[email protected]> * chore: delete bernsteinyang module (replaced by ff_inverse) * Bump version to 0.4.1 --------- Co-authored-by: David Nevado <[email protected]> Co-authored-by: Han <[email protected]> Co-authored-by: François Garillot <[email protected]> Co-authored-by: Carlos Pérez <[email protected]> Co-authored-by: einar-taiko <[email protected]> Co-authored-by: Mamy Ratsimbazafy <[email protected]> Co-authored-by: Aleksei Vambol <[email protected]>

Insert MSM and FFT code and their benchmarks.

8797915

Resolves taikoxyz/zkevm-circuits#150.

einar-taiko force-pushed the einar/pr/msm.fft branch from ae74f12 to 8797915 Compare September 8, 2023 05:44

einar-taiko added 2 commits September 8, 2023 20:23

feedback

77b98f2

Add instructions

2b26984

mratsim approved these changes Sep 8, 2023

View reviewed changes

benches/msm.rs Outdated Show resolved Hide resolved

feeback

1977dc0

einar-taiko marked this pull request as ready for review September 8, 2023 13:18

CPerezz self-requested a review September 11, 2023 11:00

kilic self-requested a review September 12, 2023 07:58

CPerezz reviewed Sep 12, 2023

View reviewed changes

han0110 reviewed Sep 13, 2023

View reviewed changes

benches/msm.rs Outdated Show resolved Hide resolved

Implement feedback: Actually supply the correct arguments to `best_mu…

68f41d3

…ltiexp`. Split into `singlecore` and `multicore` benchmarks so Criterion's result caching and comparison over multiple runs makes sense. Rewrite point and scalar generation.

han0110 approved these changes Sep 18, 2023

View reviewed changes

CPerezz approved these changes Sep 18, 2023

View reviewed changes

Use slicing and parallelism to to decrease running time.

2bc3c17

Laptop measurements: k=22: 109 sec k=16: 1 sec

mratsim approved these changes Sep 19, 2023

View reviewed changes

han0110 approved these changes Sep 19, 2023

View reviewed changes

benches/msm.rs Outdated Show resolved Hide resolved

einar-taiko added 4 commits September 20, 2023 14:06

Refactor msm

2621efe

Refactor fft

16ae146

Update module comments

a5eab13

Fix formatting

714e164

kilic approved these changes Sep 20, 2023

View reviewed changes

han0110 reviewed Sep 21, 2023

View reviewed changes

Cargo.toml Show resolved Hide resolved

Implement suggestion for fixing CI

7092451

han0110 added this pull request to the merge queue Sep 22, 2023

Merged via the queue into privacy-scaling-explorations:main with commit ee7cb86 Sep 22, 2023
7 checks passed

mratsim mentioned this pull request Sep 28, 2023

[RFC] Blackboxing MSM and FFT - Hardware Accel API privacy-scaling-explorations/halo2#216

Closed

mratsim mentioned this pull request Oct 13, 2023

RFC: Move MSM and FFT in this repo and offer a standard interface #84

Closed

3 tasks

mratsim mentioned this pull request Oct 23, 2023

Msm bench #42

Closed

huitseeker mentioned this pull request Oct 23, 2023

New crate release #96

Closed

mratsim mentioned this pull request Oct 25, 2023

Msm optimization #29

Closed

3 tasks

CPerezz mentioned this pull request Dec 18, 2023

chore: Bump to 0.5.0 for release #114

Merged

huitseeker mentioned this pull request Dec 18, 2023

Implement a fancier msm for halo2curves-related crates argumentcomputer/arecibo#193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insert MSM and FFT code and their benchmarks. #86

Insert MSM and FFT code and their benchmarks. #86

einar-taiko commented Sep 8, 2023 •

edited

Loading

mratsim commented Sep 8, 2023

CPerezz left a comment •

edited

Loading

han0110 left a comment

mratsim commented Sep 15, 2023

CPerezz commented Sep 16, 2023

mratsim commented Sep 17, 2023

han0110 left a comment

CPerezz left a comment

mratsim commented Sep 18, 2023

CPerezz commented Sep 18, 2023

mratsim left a comment

han0110 left a comment

mratsim commented Sep 19, 2023

han0110 left a comment

Insert MSM and FFT code and their benchmarks. #86

Insert MSM and FFT code and their benchmarks. #86

Conversation

einar-taiko commented Sep 8, 2023 • edited Loading

mratsim commented Sep 8, 2023

CPerezz left a comment • edited Loading

Choose a reason for hiding this comment

han0110 left a comment

Choose a reason for hiding this comment

mratsim commented Sep 15, 2023

CPerezz commented Sep 16, 2023

mratsim commented Sep 17, 2023

han0110 left a comment

Choose a reason for hiding this comment

CPerezz left a comment

Choose a reason for hiding this comment

mratsim commented Sep 18, 2023

CPerezz commented Sep 18, 2023

mratsim left a comment

Choose a reason for hiding this comment

han0110 left a comment

Choose a reason for hiding this comment

mratsim commented Sep 19, 2023

Halo2curves

Gnark

Constantine

han0110 left a comment

Choose a reason for hiding this comment

einar-taiko commented Sep 8, 2023 •

edited

Loading

CPerezz left a comment •

edited

Loading