Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rope2d #75

Open
bhack opened this issue Nov 11, 2024 · 6 comments
Open

Rope2d #75

bhack opened this issue Nov 11, 2024 · 6 comments

Comments

@bhack
Copy link

bhack commented Nov 11, 2024

Can you add an example about Rope2d as in META Sam2 https://github.com/facebookresearch/sam2/blob/main/sam2/modeling/sam/transformer.py#L289

@drisspg
Copy link
Contributor

drisspg commented Nov 16, 2024

Just to confirm, do you mean an example where rope is fused into FlashAttention as opposed to hows it done in SAM2 where q,k are done prior and then ran with Flash?

@bhack
Copy link
Author

bhack commented Nov 16, 2024

Yes, Does this fit in the flex API or not?

@drisspg
Copy link
Contributor

drisspg commented Nov 18, 2024

This currently does not fit within the Flex API since this is typically implemented by pre-mutating Q and K where we don't provide any ways to mutate QK before the dot product operation.

@bhack
Copy link
Author

bhack commented Nov 18, 2024

Is it in the roadmap?

@drisspg
Copy link
Contributor

drisspg commented Nov 18, 2024

Not currently, from what I know fusion ends up not being beneficial in training can be beneficial for memory bound cases in decoding

I will leave this open though I think we have a few other things that are high priority, like learnable biases that I am working on but will think about how this can be supported

@bhack
Copy link
Author

bhack commented Nov 18, 2024

Do you have some alternative SOTA 2d learnable bias in the roadmap?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants