Rope2d #75

bhack · 2024-11-11T00:23:54Z

Can you add an example about Rope2d as in META Sam2 https://github.com/facebookresearch/sam2/blob/main/sam2/modeling/sam/transformer.py#L289

drisspg · 2024-11-16T04:46:01Z

Just to confirm, do you mean an example where rope is fused into FlashAttention as opposed to hows it done in SAM2 where q,k are done prior and then ran with Flash?

bhack · 2024-11-16T11:47:57Z

Yes, Does this fit in the flex API or not?

drisspg · 2024-11-18T22:50:28Z

This currently does not fit within the Flex API since this is typically implemented by pre-mutating Q and K where we don't provide any ways to mutate QK before the dot product operation.

bhack · 2024-11-18T23:00:46Z

Is it in the roadmap?

drisspg · 2024-11-18T23:06:06Z

Not currently, from what I know fusion ends up not being beneficial in training can be beneficial for memory bound cases in decoding

I will leave this open though I think we have a few other things that are high priority, like learnable biases that I am working on but will think about how this can be supported

bhack · 2024-11-18T23:09:41Z

Do you have some alternative SOTA 2d learnable bias in the roadmap?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rope2d #75

Rope2d #75

bhack commented Nov 11, 2024

drisspg commented Nov 16, 2024

bhack commented Nov 16, 2024

drisspg commented Nov 18, 2024

bhack commented Nov 18, 2024

drisspg commented Nov 18, 2024

bhack commented Nov 18, 2024

Rope2d #75

Rope2d #75

Comments

bhack commented Nov 11, 2024

drisspg commented Nov 16, 2024

bhack commented Nov 16, 2024

drisspg commented Nov 18, 2024

bhack commented Nov 18, 2024

drisspg commented Nov 18, 2024

bhack commented Nov 18, 2024