We observe that to calculate
Based on the same idea, we can further combine the process of
In this library, we optimize multi-output exponentiations till 4 numbers. For larger values, combining more exponents becomes harder
For example, if we want to find all the common bits of 4 integers, the sum of all the combinations is
The most advanced method to calculate the modular exponent is Montgomery modular multiplication with a binary expression of exponents. We modify the Montgomery modular multiplication based on Golang's Big library. Besides, we also implement precomputation tables for even better speed-up.
We generate a random base
Time(ms) for # of bits=2,000 | Time(ms) for # of bits=20,000 | Time(ms) for # of bits=200,000 | Time(ms) for # of bits=2,000,000 | |
---|---|---|---|---|
2×NaiveExponent | 11.1 | 94.9 | 911.9 | 9443.7 |
DoubleExponent | 7.6 | 66.9 | 646.4 | 6640.3 |
4×NaiveExponent | 22.3 | 187.5 | 1827.6 | 18830.5 |
FourfoldExponent | 9.9 | 77.1 | 733.9 | 7482.0 |
precomputeFourfoldExponent | 5.5 | 36.9 | 359.7 | 3596.5 |
Looking at the specific functions, first, we calculate
We observe that with our optimizations DoubleExponent is around 30% faster and FourfoldExponent around 60% faster than their original counterparts. Additionally, we report the performance of FourfoldExponent when combined with a precomputation table that includes a precomputation of every single bit; we refer to this as the precomputeFourfoldExponent function. Precomputation tables with more elaborate settings (e.g., including four combinations of every two precomputed bits) lead to faster calculation and larger table sizes. We list the precomputation table size with respect to the maximum exponent bits supported.
Max exponent bits | |||||
---|---|---|---|---|---|
Table Size (MB) | 8 | 32 | 128 | 512 | 2048 |
You can run the benchmark using Golang's official benchmark framework by:
go test -bench .
The default bit-length of exponent is set to be 20000 by the following parameter in benchmark_test.go
const numTestBits = 20000
The benchmark for BenchmarkExpParallel2 denotes the benchmark for calculating exponent with a pre-computation table using 2 threads, BenchmarkExpParallel4 denotes the same experiment with 4 threads, and so on.