Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reason about efficiency of different score/mask mod functions #63

Open
alex-hh opened this issue Oct 22, 2024 · 3 comments
Open

Comments

@alex-hh
Copy link

alex-hh commented Oct 22, 2024

Hi,

The fact that it's possible to create arbitrary score mod / mask mod patterns is really powerful!

I'm wondering if there is any way to reason about the efficiency of different masking patterns (if this is a relevant consideration)?

For example, is a 'full' score_mod e.g. returning bias[b, h, i, j], where bias is some explicitly materialised attention bias tensor going to yield any efficiency gains over manually adding the bias to the attention logits? What are the relative efficiencies of e.g. structured and random sparsity patterns in mask_mod?

Thanks

@alex-hh alex-hh changed the title How to reason about efficiency How to reason about efficiency of different score/mask mod functions Oct 22, 2024
@Chillee
Copy link
Contributor

Chillee commented Oct 25, 2024

@alex-hh Generally speaking, the less memory you have to access from outside the kernel, the better. So loading from a full bias (i.e. size S^2) is going to be slower than loading from a 1d bias (i.e. size S), which is going to be slower than loading from.

For sparsity, FlexAttention is fundamentally block-sparse. So pure random sparsity is unlikely to help much.

@alex-hh
Copy link
Author

alex-hh commented Oct 25, 2024

Thanks for the reply! Got it re the memory.

Regarding block sparsity - does this mean that given a particular mask_mod pattern, there is potentially an optimal way of permuting the inputs before applying flex attention?

@drisspg
Copy link
Contributor

drisspg commented Oct 28, 2024

Yeah indeed there is see, see this thread for some discussion: #56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants