You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now the backing type of Block is [u8; 16] as it was simple to start with. However, core arrays can not always take advantage of auto-vectorization/SIMD for important operations.
For example, on my machine, it currently takes ~3ns to compute BitXor between two blocks. If using std::arch::x86_64::__mm128i this would be <1ns. We can of course just convert the [u8; 16] to/from __mm128i for the operation, but the overhead of doing so erases any performance gains.
Block would ideally use an u8x16 SIMD backing type depending on target architecture. Unfortunately std::simd is not stable yet so we will have to provide implementations for the various architectures.
To briefly motivate doing this, after some quick profiling and napkin math it seems that garbling a circuit is currently bottlenecked on BitXor, not the CCR hashing!
For example, the AES circuit has 6500 AND and 30,163 XOR gates. Garbling this circuit currently takes ~350 microseconds on my machine. Garbling an AND gate requires up to 9 XOR ops. In total that is 88,663 XOR ops to garble AES, which adds up to 266 microseconds or roughly 75% of CPU time.
The text was updated successfully, but these errors were encountered:
I quickly hacked together something locally for x86 and found only an 8% improvement for garbling AES. So I suspect my napkin math/profiling above was incorrect. Nonetheless this feature should be pursued.
Right now the backing type of
Block
is[u8; 16]
as it was simple to start with. However, core arrays can not always take advantage of auto-vectorization/SIMD for important operations.For example, on my machine, it currently takes ~3ns to compute
BitXor
between two blocks. If usingstd::arch::x86_64::__mm128i
this would be <1ns. We can of course just convert the[u8; 16]
to/from__mm128i
for the operation, but the overhead of doing so erases any performance gains.Block
would ideally use anu8x16
SIMD backing type depending on target architecture. Unfortunatelystd::simd
is not stable yet so we will have to provide implementations for the various architectures.To briefly motivate doing this, after some quick profiling and napkin math it seems that garbling a circuit is currently bottlenecked on
BitXor
, not the CCR hashing!For example, the AES circuit has 6500 AND and 30,163 XOR gates. Garbling this circuit currently takes ~350 microseconds on my machine. Garbling an AND gate requires up to 9 XOR ops. In total that is 88,663 XOR ops to garble AES, which adds up to 266 microseconds or roughly 75% of CPU time.
The text was updated successfully, but these errors were encountered: