Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] ONNX Runtime build fails OOM (v1.20.0) #22859

Open
mc-nv opened this issue Nov 15, 2024 · 11 comments
Open

[Build] ONNX Runtime build fails OOM (v1.20.0) #22859

mc-nv opened this issue Nov 15, 2024 · 11 comments
Labels
build build issues; typically submitted using template

Comments

@mc-nv
Copy link
Contributor

mc-nv commented Nov 15, 2024

Describe the issue

Getting issue trying to compile against rel-1.20.0 branch.
We are getting out of memory issue, for both Linux and Windows platforms.

windows config (64GB RAM):

BUILDTOOLS_VERSION:17.12.35506.116 
CMAKE_VERSION:3.30.5 
CUDA_VERSION:12.6.2 
CUDNN_VERSION:9.5.1.17 
PYTHON_VERSION:3.12.3 
TENSORRT_VERSION:10.6.0.26 
VCPGK_VERSION:2024.03.19

LInux (64GB RAM):

CMAKE_VERSION:3.28.3
CUDA_VERSION:12.6.2 
CUDNN_VERSION:9.5.1.17 
PYTHON_VERSION:3.12.3 
TENSORRT_VERSION:10.6.0.26 

Urgency

ASAP

Target platform

Linux, Windows

Build script

Windows:

onnxruntime/tools/ci_build/build.py `
   --cmake_generator "Visual Studio 17 2022" `
   --config Release `
   --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=75;80;86;90" `
   --skip_submodule_sync `
   --parallel `
   --build_shared_lib `
   --compile_no_warning_as_error `
   --skip_tests `
   --update `
   --build `
   --build_dir /workspace/build `
   --use_cuda `
   --cuda_home ${env:CUDA_PATH} `
   --cudnn_home ${env:CUDA_PATH} `
   --use_tensorrt --tensorrt_home "/tensorrt" ; `

linux:

./build.sh \
  --config Release \
  --skip_submodule_sync \
  --parallel \
  --build_shared_lib     \
  --compile_no_warning_as_error \
  --build_dir /workspace/build \
  --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES='75;80;86;90'  \
  --update \
  --build \
  --use_cuda \
  --cuda_home "/usr/local/cuda" \
  --cudnn_home "/usr" \
  --use_tensorrt \
  --use_tensorrt_builtin_parser \
  --tensorrt_home "/usr/src/tensorrt" \
  --allow_running_as_root \
  --use_openvino CPU

Error / output

No error, container fails out of memory.

Visual Studio Version

No response

GCC / Compiler Version

No response

@mc-nv mc-nv added the build build issues; typically submitted using template label Nov 15, 2024
@mc-nv
Copy link
Contributor Author

mc-nv commented Nov 15, 2024

@snnn for viz

@mc-nv mc-nv changed the title [Build] ONNX Runtime build fails OOM [Build] ONNX Runtime build fails OOM (v1.20.0) Nov 15, 2024
@snnn
Copy link
Member

snnn commented Nov 15, 2024

Use " --parallel <n>" to reduce the parallelism.

@snnn
Copy link
Member

snnn commented Nov 15, 2024

It is more about how much memory you have for each CPU core than how much memory you have in total.

@mc-nv
Copy link
Contributor Author

mc-nv commented Nov 15, 2024

See linux build uses --parallel and it heavy machines where we never see issue building ONNX Runtime.

@snnn
Copy link
Member

snnn commented Nov 15, 2024

Sorry my response was eaten by a part because of formatting. I meant, put a number there after "--parallel", to limit the number of concurrent processes. Let's say you have 64GB memory and 16 CPUs. By default make/msbuild will create at most 16 subprocesses. Since we do not know if 4GB is enough for one compiler process, sometimes we might need to manually adjust the parallelism to avoid OOM.

@mc-nv
Copy link
Contributor Author

mc-nv commented Nov 15, 2024

Sounds like a suggestion to have 8Gb per process, am I right?

@mc-nv
Copy link
Contributor Author

mc-nv commented Nov 16, 2024

Sorry my response was eaten by a part because of formatting. I meant, put a number there after "--parallel", to limit the number of concurrent processes. Let's say you have 64GB memory and 16 CPUs. By default make/msbuild will create at most 8 subprocesses. Since we do not know if 4GB is enough for one compiler process, sometimes we might need to manually adjust the parallelism to avoid OOM.

See in my scenario we don't set limit to parallel jobs and using default which "1" by default: https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/build.py#L171

What will be the reason to set limit to 2 or 4 if we failing with OOO using single process?

@snnn
Copy link
Member

snnn commented Nov 16, 2024

Actually the default is not one. If the optional value is 0 or unspecified, it is interpreted as the number of CPUs. As you know how much CPUs the machine has, you may start with dividing it by half. For example, if we think the default value is 16, we try 8 first. If the error still exists, we decrease it further. Eventually it will pass because 64GB is definitely enough for one single compiler processs.

@snnn
Copy link
Member

snnn commented Nov 16, 2024

You may also need to tune the "--nvcc_threads" parameter. To be safe, you can set it to one.

@mc-nv
Copy link
Contributor Author

mc-nv commented Nov 16, 2024

My windows build environment has 2 CPUs.

@tianleiwu
Copy link
Contributor

tianleiwu commented Nov 16, 2024

Estimated memory usage is nvcc_threads * parallel * 8GB so you will need at least 16 GB memory for --parallel 2 --nvcc_threads 1. Otherwise, try --parallel 1 --nvcc_threads 1. If you do not set them, nvcc_threads=parallel=vCPU=2, so you will need 32GB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template
Projects
None yet
Development

No branches or pull requests

3 participants