split_scalars_kernel kernel function #45

TalDerei · 2023-03-17T22:59:22Z

The for loop in this kernel can be eliminated with the integration of cooperative groups. Instead of single thread looping over all the limbs for a single scalar, multiple threads can access a different limbs (or sub-parts of the same limb) of the same scalar in parallel. This would require refactoring the arithmetic to support multi-threaded field operations. This is a longer-term optimization worth looking into, and if it's right for your codebase.

LeonHibnik assigned DmytroTym Apr 19, 2023

Otsar-Raikou added the backlog label Jan 7, 2024

DmytroTym added area:msm lang:cuda/cpp labels Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split_scalars_kernel kernel function #45

split_scalars_kernel kernel function #45

TalDerei commented Mar 17, 2023

split_scalars_kernel kernel function #45

split_scalars_kernel kernel function #45

Comments

TalDerei commented Mar 17, 2023