pytorch-labs / attention-gym Public

Notifications You must be signed in to change notification settings
Fork 34
Star 659

Code
Issues 55
Pull requests 3
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: pytorch-labs/attention-gym

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

55 Open 39 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Feature Request: Support for Dynamic Bias Tensor in FlexAttention Without Recompilation

#123 opened Feb 18, 2025 by pengzhangzhi

BlockMask Value Error

#122 opened Feb 17, 2025 by Leo-T-Zang

Use case against efficient SDPA backend

#121 opened Feb 13, 2025 by TParcollet

FlexAttention customizability for softmax

#117 opened Feb 9, 2025 by veritas9872

Sigmoid attention?

#116 opened Feb 9, 2025 by lhallee

Error when using flex attention and F.sdpa together.

#115 opened Feb 8, 2025 by wuyushuwys

Does Flex attention API accepts a customized attention mask?

#112 opened Feb 6, 2025 by jiagaoxiang

Non-elementwise score_mod

#111 opened Jan 30, 2025 by AmoghDabholkar

Dynamic mask block sizes during inference

#109 opened Jan 30, 2025 by windsornguyen

Can FlexAttention Optimize Masks for Large Table Constraints?

#106 opened Jan 15, 2025 by RaphaelMouravieff

FlexAttention uses much more GPU memory than FlashAttention-2

#101 opened Jan 9, 2025 by ChenlongDeng

Building a composite mask with attention_mask from tokenizers

#98 opened Jan 2, 2025 by lhallee

Illegal memory access on backward when there are unused block masks (nightly build)

#96 opened Dec 28, 2024 by timt51

FlexAttention slower than eager in HF transformers

#95 opened Dec 27, 2024 by staghado

Doc mask returns negative sparsity

#93 opened Dec 21, 2024 by staghado

question about masking

#92 opened Dec 18, 2024 by esason

Short vs long sequences performance question

Further information is requested

#89 opened Dec 12, 2024 by francoishernandez

[Inquiry] Document Masking and Assigning Different Weights

#88 opened Dec 12, 2024 by yeahjack

flexattn with qwen2

#81 opened Nov 18, 2024 by NonvolatileMemory

Flex attention with dropout

#77 opened Nov 13, 2024 by zbh2047

Flex attention - gaps in profiler

#76 opened Nov 11, 2024 by tugot17

Rope2d

#75 opened Nov 11, 2024 by bhack

How to implement Bidirectional Alibi with padding using flex attention?

#74 opened Nov 7, 2024 by sphmel

Is there any chance to call backward function dircetly instead of using pytorch autograd mechanism?

#73 opened Nov 7, 2024 by MayDomine

Block Size when Q_LEN and KV_LEN are different

#71 opened Nov 4, 2024 by johng149

Previous 1 2 3 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly