0.4.0 version is released on Nov 15th 2024
The 0.4.0 version starts to support FlashAttention V3 on Hopper GPUs. This version also has been tested on low-memory (24GB) GPUs.
What's Changed
- update all_to_all by @Lay2000 in #87
- upgrade version to 0.3.6 by @feifeibear in #89
- prevent dispatch issues in SequenceParallel to improve performance by @uclalch in #92
- sync after all2all for device memory saving by @feifeibear in #93
- version 0.3.7 by @feifeibear in #94
- flash attention 3 by @feifeibear in #95
- FA3: update to the latest FA3 API. by @feifeibear in #96
- add dit benchmark script by @feifeibear in #97
- 1114v2 by @feifeibear in #98
- update readme for FA3 by @feifeibear in #99
- version to 0.4.0 by @feifeibear in #100
- dump version to 0.4.1 by @feifeibear in #101
- dump to 0.4.0 by @feifeibear in #102
New Contributors
Full Changelog: 0.3.5...0.4.0