You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Your idea is very excellent and I have starred your repo. I want to check my understanding's correctness:
This paper does not modify the kernel implementation but instead considers that different rows in the sequence dimension of Q are independent. Therefore, it calculates from attention to FFN in one go, which quickly consumes intermediate results and allows for the computation of larger sequence lengths.
Is it correct?
The text was updated successfully, but these errors were encountered:
Your idea is very excellent and I have starred your repo. I want to check my understanding's correctness:
This paper does not modify the kernel implementation but instead considers that different rows in the sequence dimension of Q are independent. Therefore, it calculates from attention to FFN in one go, which quickly consumes intermediate results and allows for the computation of larger sequence lengths.
Is it correct?
The text was updated successfully, but these errors were encountered: