User ability to decide particular GPU for a model #84
-
Hello, Thank you very much for publicly sharing this library! Could you please let me know if it is possible for the user to decide which GPU a model should utilize? I am currently integrating FTorch with a MPI-based solver. I usually have access to a node that has 40 MPI ranks and 2 GPUs. I believe FTorch in its current form only leverages the default GPU to run the model. I may be wrong. I was wondering if changes could be made to get_device (https://github.com/Cambridge-ICCS/FTorch/blob/main/src/ctorch.cpp#L32) based on https://discuss.pytorch.org/t/how-to-specify-cuda-device-number-in-c/54220/3 to let the user decide which GPU to leverage. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
Hi @siddanib , This is something we are still actively working on, but part of this will be set/handled by your job scheduling software (probably slurm) and the system/architecture. It depends slightly how things are set up as to which GPU each CPU on the node will offload to by default/specification. The first step would be to establish how the GPUs appear on your system (e.g. by accessing a compute node in an interactive session). If you can do this we can make some progress. Some information about this (assuming a fully saturated node) is here, but we are still working on this. I will also try to look at this once we are back in the new year. |
Beta Was this translation helpful? Give feedback.
-
Hi @jatkinson1000, Thank you very much for your detailed response and resource. The command I did end up experimenting with this idea of user explicitly setting the GPU device number. I have made some preliminary changes that are currently here (https://github.com/siddanib/FTorch). Please note that the changes are very crude, but some preliminary tests on my end showed positive results. My idea was to change the Could you please let me know a list of things that you would like me to do before opening a pull request. Furthermore, Are you considering of utilizing CUDA MPS (https://docs.nvidia.com/deploy/mps/index.html) to look into the N-MPI ranks trying to utilize a single GPU? Happy New Year! |
Beta Was this translation helpful? Give feedback.
-
This request was completed in #96 by @jwallwork23 and is now in production! |
Beta Was this translation helpful? Give feedback.
This request was completed in #96 by @jwallwork23 and is now in production!