You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The for loop in this kernel can be eliminated with the integration of cooperative groups. Instead of single thread looping over all the limbs for a single scalar, multiple threads can access a different limbs (or sub-parts of the same limb) of the same scalar in parallel. This would require refactoring the arithmetic to support multi-threaded field operations. This is a longer-term optimization worth looking into, and if it's right for your codebase.
The text was updated successfully, but these errors were encountered:
The for loop in this kernel can be eliminated with the integration of cooperative groups. Instead of single thread looping over all the limbs for a single scalar, multiple threads can access a different limbs (or sub-parts of the same limb) of the same scalar in parallel. This would require refactoring the arithmetic to support multi-threaded field operations. This is a longer-term optimization worth looking into, and if it's right for your codebase.
The text was updated successfully, but these errors were encountered: