Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: add avx512 fft for koalabear #612

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

perf: add avx512 fft for koalabear #612

wants to merge 9 commits into from

Conversation

gbotrel
Copy link
Collaborator

@gbotrel gbotrel commented Jan 21, 2025

Description

This PR adds optimized (AVX512 assembly) FFT kernels (DIF only for now) for koalabear and babybear fields.
This is a pre-requisite for faster SIS hashing, which will have its own dedicated PR.

Still a bit of room for reducing these numbers; right now kernel of size 128 is fully in AVX512, but we could extend that a bit to do the full 512 size (matching SIS hashing params) if needed.

Benchmark on koalabear

FFT DIF with domain cardinality == 512:

BenchmarkFFTDIFReferenceSmall-16     11604         2352          -79.73%
BenchmarkFFTDIFReferenceSmall-16     11668         2398          -79.45%
BenchmarkFFTDIFReferenceSmall-16     12207         2280          -81.32%

With cardinality == 1 <<20

BenchmarkFFTDIFReference-16     2950281       934681        -68.32%
BenchmarkFFTDIFReference-16     3422121       937623        -72.60%
BenchmarkFFTDIFReference-16     3442894       938829        -72.73%

RingSIS with log2(bound) == 16 and degree == 512 (secure parameters used in linea prover)

benchmark                                                             old ns/op     new ns/op     delta
BenchmarkSIS/ring-sis/inputs=65536/log2-bound=16/log2-degree=9-16     3315941       407467        -87.71%
BenchmarkSIS/ring-sis/inputs=65536/log2-bound=16/log2-degree=9-16     3306674       407898        -87.66%
BenchmarkSIS/ring-sis/inputs=65536/log2-bound=16/log2-degree=9-16     3296878       409011        -87.59%

benchmark                                                             old allocs     new allocs     delta
BenchmarkSIS/ring-sis/inputs=65536/log2-bound=16/log2-degree=9-16     260            3              -98.85%
BenchmarkSIS/ring-sis/inputs=65536/log2-bound=16/log2-degree=9-16     260            3              -98.85%
BenchmarkSIS/ring-sis/inputs=65536/log2-bound=16/log2-degree=9-16     260            3              -98.85%

benchmark                                                             old bytes     new bytes     delta
BenchmarkSIS/ring-sis/inputs=65536/log2-bound=16/log2-degree=9-16     14464         4144          -71.35%
BenchmarkSIS/ring-sis/inputs=65536/log2-bound=16/log2-degree=9-16     14464         4144          -71.35%
BenchmarkSIS/ring-sis/inputs=65536/log2-bound=16/log2-degree=9-16     14464         4144          -71.35%

@gbotrel gbotrel added the perf label Jan 21, 2025
@gbotrel gbotrel requested review from ivokub and Tabaie January 21, 2025 03:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant