-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Build] ONNX Runtime build fails OOM (v1.20.0) #22859
Comments
@snnn for viz |
Use " --parallel <n>" to reduce the parallelism. |
It is more about how much memory you have for each CPU core than how much memory you have in total. |
See linux build uses |
Sorry my response was eaten by a part because of formatting. I meant, put a number there after "--parallel", to limit the number of concurrent processes. Let's say you have 64GB memory and 16 CPUs. By default make/msbuild will create at most 16 subprocesses. Since we do not know if 4GB is enough for one compiler process, sometimes we might need to manually adjust the parallelism to avoid OOM. |
Sounds like a suggestion to have 8Gb per process, am I right? |
See in my scenario we don't set limit to parallel jobs and using default which "1" by default: https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/build.py#L171 What will be the reason to set limit to 2 or 4 if we failing with OOO using single process? |
Actually the default is not one. If the optional value is 0 or unspecified, it is interpreted as the number of CPUs. As you know how much CPUs the machine has, you may start with dividing it by half. For example, if we think the default value is 16, we try 8 first. If the error still exists, we decrease it further. Eventually it will pass because 64GB is definitely enough for one single compiler processs. |
You may also need to tune the "--nvcc_threads" parameter. To be safe, you can set it to one. |
My windows build environment has 2 CPUs. |
Estimated memory usage is nvcc_threads * parallel * 8GB so you will need at least 16 GB memory for |
Describe the issue
Getting issue trying to compile against
rel-1.20.0
branch.We are getting out of memory issue, for both Linux and Windows platforms.
windows config (64GB RAM):
LInux (64GB RAM):
Urgency
ASAP
Target platform
Linux, Windows
Build script
Windows:
linux:
Error / output
No error, container fails out of memory.
Visual Studio Version
No response
GCC / Compiler Version
No response
The text was updated successfully, but these errors were encountered: