-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
installation issue with cuda 12 #494
Comments
@blakemertz Are you sure you are using your OS's gcc? Could you please activate your environment and try |
@vaclavhanzl thanks for responding. My OS gcc is v 12 -- I specifically deleted the existing symlink to gcc14 and recreated it to gcc12, checking with gcc -v in both my OS and in my openfold environment. I will double-check again and also try installing gcc=12.4 with mamba and let you know if that fixes the issue. |
@blakemertz Please try this environment from my PR #496 |
@vaclavhanzl thanks for sharing. I noticed you are using your own cuda tools (not included in environment.yml). Are you installing from your Debian repositories or pulling them from the nvidia channel in conda? Update: never mind, I saw that it pulled in cudatoolkit (v 11.8) when I created the environment. |
@vaclavhanzl thanks again for all your help. My guess is that the dependencies b/t gcc, numpy < 2, and pytorch w/CUDA 12 were making my original environment break. This was a time-consuming task on your part -- much appreciated. While running the unit test after setting up the environment, I had 8 failed tests and had to modify two of the python scripts in the test directory as per #467 to reduce the number of failed tests to one:
I suppose one could explicitly point to the params_model_1_ptm.npz file by trying to pass the --jax_param_path flag, but not sure the exact syntax for that. I will consider this closed for now, hope your pull gets pushed back into the pl_upgrades branch b/c I am sure there are plenty of users rolling cuda12 and pytorch2 right now...... |
@blakemertz Thanks for all the tests. To answer your question (sorry, it was too late night here when I saw it), as you already noticed, most things come from the
while having this in
Note that I explicitly avoided anything from the Nvidia website (I appreciate their nice efforts but using just the Debian repos is much simpler). Even my apt-get setup is probably still an overkill installing things which will not be used. All you want on the OS level is to get
Using the environment with #496 applied, I get these versions:
I did many desperate things in the past while trying to install OpenFold (all my other posts here are likely obsoleted by this one). If you are reading this, you likely got your share of this pain, too. I learned that apart from installing what works, even more important is uninstalling what you installed before while searching for your way. Seriously, if a clean OS install is possible for you, it is a good start. Your previous experiments likely left you in a minefield of pitfalls which make debugging OpenFold's own problems extremely hard. You may try some cleanups I did in the past: If your monitor is NOT plugged to your GPU (and you use it just for CUDA), you may do things as drastic as:
etc., until Equally important is to clean up anything python related. If you experimented with various ways to make python virtual environments, you can have nasty landmines waiting in some very obscure places, triggered for certain versions of python only. Searching for good python version in a good environment for OpenFold can be easily spoiled by this. Verify directories along the python's library import path |
@blakemertz And for this issue 494 - I guess it should stay open until PR #496 (or something similar) is merged? |
PR #496 is now merged so I think this issue could be closed (please @blakemertz - looks like I cannot do that but you could, thanks). |
I have tried several permutations to get openfold to install on my local machine, but no joy up to this point. Could use some help, as I need to install openfold as a dependency for a couple of other codes (in particular DiffDock-L). Here is my GPU, driver, and cuda:
My v12 of gcc/g++/gfortran on my OS is 12.4 -- I believe that 12.2 is the highest version supported by cuda 12.1/2, but 12.4 is what is included in my Debian testing repos.
My packages for the openfold environment, pulled from the pl_upgrades branch to be able to utilize pytorch v2 and cuda 12:
During installation of 3rd-party dependencies, I get the following output, indicating that the dependencies did not install (setup.py install is part of this process and failed to run):
This is where I am stuck -- don't really know what to do with the "Error compiling objects for extension". I have already looked at #403 , #462 , and #477 and have done my best to implement their suggestions, but obviously do not have a fully working environment.
The text was updated successfully, but these errors were encountered: