-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[How-to]How to get Flash-Attention under windows 11 CUDA #1469
Comments
(base) C:\Windows\system32>conda activate my10 (my10) C:\Windows\system32>cd C:\Users\TARGET STORE\Desktop\1\flash-attention (my10) C:\Users\TARGET STORE\Desktop\1\flash-attention>pip install flash_attn-2.7.1.post1+cu124torch2.4.0cxx11abiFALSE-cp310-cp310-win_amd64.whl (my10) C:\Users\TARGET STORE\Desktop\1\flash-attention>python
|
Thanks, compiled successfully with torch 2.6.0 cu124. Refer to the installation guide of triton-windows for related compiler issues. |
|
Did you compile successfully with torch 2.6.0 cu126? |
You are compiling for ROCm, which i dont have experience with.. still: you neet to run the install command in a cmd windows in admin mode else you get "filename too long" errors.
I compiled torch 2.6.0 cu124. @werruww you are using some binary that is not what i linked. try the ones i linked to. |
Here is a guide on how to get Flash attention to work under windows. By either downloading a compiled file or compiling yourself.
Its not hard but if you are fully new here the infos are not in a central point.
I needed this under windows and the "pip install flash-attn (--no-build-isolation)" does not work.. you get half an hour of things until it crashes due to either not finding torch (which is installed) or some other causes.. There are a couple of threads but they describe old hacks that are not needed (modifying files).
First of all: instead of compiling yourself (which takes more than 2 hours on my 64GB 12 core machine for a full compile) try downloading a precompiled lib from here. i can also confirm they work:
https://github.com/bdashore3/flash-attention/releases
HOW TO COMPILE
I can confirm this works on my machine with the latest code as of JAN2025
If you need the latest version (curently 2.7.3) or a python version not included above read on..
You should have CUDA toolkit installed and working.
you will need c++ compiler. if you dont have Visualc++ already installed. run this in a administrator console. these commands install C++ compiler silently Still they are a couple of GB big.:
Windows 11 SDK ms build to compile C++ Libraries.
winget install --id=Microsoft.VisualStudio.2022.BuildTools --force --override "--wait --passive --add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 --add Microsoft.VisualStudio.Component.Windows11SDK.22621" -e --silent --accept-package-agreements --accept-source-agreements
Alternatively use this on windows 10
winget install --id=Microsoft.VisualStudio.2022.BuildTools --force --override "--wait --passive --add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 --add Microsoft.VisualStudio.Component.Windows10SDK" -e --silent --accept-package-agreements --accept-source-agreements
now create a python virtual environment (for the python version of your project!. the resulting lib will be tied to that python version) and activate (this depends on you.. so i wont give that one command)
inside the environment install these libraries (these
Create a requirements.txt file with the following content:
Those lines install torch from the pytorch repo with support for CUDA 12. if you need another version you can get the repo ID on the pytorch site.
install that file as usual with:
pip install -r requirements.txt
clone the flash attention repository in the environment and run this in the new cloned directory in administrator mode (else you get
Filename too long
errors even when developer mode is on):python setup.py install
(this takes about 2 hours) during which all CPU cores were at max usage and also the RAM was under heavy load.python setup.py bdist_wheel
(this takes about 1 minute)you will get a whl file in
\flash-attention\dist
The resulting whl file can be installed on your target projects environment with
pip install [path to wheelfilename].whl
Hope this helps
The text was updated successfully, but these errors were encountered: