-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mamba2 Causality #700
Comments
How large is the difference? |
About 1e-7 to 1e-6, I'm suspecting it is due to some floating point precision rather than the model itself. |
Why would these be the same? The hidden states should be different after processing 10 items in the sequence. It is not a linear time-invariant system. |
Yes, but I'm comparing the model's output rather than its hidden state. Since the model is causal, the first 10 outputs should remain the same regardless of sequence length, as no future information is used. That said, the difference I observed is extremely small, which makes me inclined to believe the assertion failed due to inherent GPU precision variations rather than a fundamental issue with the model. |
Yes, sorry, I misread your code. That's interesting. Have you tried setting the chunk_size parameter to 1. Mamba splits the input into chunks and process them in parallel, then recombines them, so there might be some numerical noise depending on the chunking. |
That's probably fine. |
Hi. Thank you for your wonderful work! I would like to inquire about the causality of Mamba2. I think theoretically it should be causal, however, when I run the code below:
I get
AssertionError: Outputs are not equal
I have already excluded the factor of randomness since when running
The assertion passed.
Is there any extra argument I need to add to make it causal?
Thank you for your help.
The text was updated successfully, but these errors were encountered: