masked-attention 算法详解 - Zhang #202
Replies: 1 comment
-
评论了好多,结果没有了 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
masked-attention 算法详解 - Zhang
从事 LLM 推理部署、视觉算法开发、模型压缩部署以及算法SDK开发工作,终身学习践行者。TransformerCasual Mask 机制的本质是为了构建下三角的注意力分数矩阵,从而实现因果模型只关注当前 token 与之前 token 的注意力关系,而不理会它与后续 token 的关系,即只
https://www.armcvai.cn/2024-11-10/masked-attention.html
Beta Was this translation helpful? Give feedback.
All reactions