Skip to content

Commit

Permalink
Set make max load when building libtorch (pytorch#89237)
Browse files Browse the repository at this point in the history
The nccl build is still OOM sometimes when using `$(MAKE)`:

```
virtual memory exhausted: Cannot allocate memory
Makefile:73: recipe for target '/var/lib/jenkins/cpp-build/caffe2/build/nccl/obj/collectives/device/devlink.o' failed
make[5]: *** [/var/lib/jenkins/cpp-build/caffe2/build/nccl/obj/collectives/device/devlink.o] Error 1
make[5]: Leaving directory '/var/lib/jenkins/workspace/third_party/nccl/nccl/src/collectives/device'
```

* https://github.com/pytorch/pytorch/actions/runs/3476485191/jobs/5811758058
* https://github.com/pytorch/pytorch/actions/runs/3422228421/jobs/5702153639

So trying to set the same limit here as when building with ninja

Pull Request resolved: pytorch#89237
Approved by: https://github.com/malfet
  • Loading branch information
huydhn authored and pytorchmergebot committed Nov 18, 2022
1 parent 7ec8a4d commit ee2ce3f
Showing 1 changed file with 15 additions and 14 deletions.
29 changes: 15 additions & 14 deletions cmake/External/nccl.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -15,23 +15,24 @@ if(NOT __NCCL_INCLUDED)
# this second replacement is needed when there are multiple archs
string(REPLACE ";-gencode" " -gencode" NVCC_GENCODE "${NVCC_GENCODE}")

if("${CMAKE_GENERATOR}" MATCHES "Make")
# Recursive make with jobserver for parallelism
set(MAKE_COMMAND "$(MAKE)")
if(DEFINED ENV{MAX_JOBS})
set(MAX_JOBS "$ENV{MAX_JOBS}")
else()
if(DEFINED ENV{MAX_JOBS})
set(MAX_JOBS "$ENV{MAX_JOBS}")
else()
include(ProcessorCount)
ProcessorCount(NUM_HARDWARE_THREADS)
# Assume 2 hardware threads per cpu core
math(EXPR MAX_JOBS "${NUM_HARDWARE_THREADS} / 2")
# ProcessorCount might return 0, set to a positive number
if(MAX_JOBS LESS 2)
set(MAX_JOBS 2)
endif()
include(ProcessorCount)
ProcessorCount(NUM_HARDWARE_THREADS)
# Assume 2 hardware threads per cpu core
math(EXPR MAX_JOBS "${NUM_HARDWARE_THREADS} / 2")
# ProcessorCount might return 0, set to a positive number
if(MAX_JOBS LESS 2)
set(MAX_JOBS 2)
endif()
endif()

if("${CMAKE_GENERATOR}" MATCHES "Make")
# Recursive make with jobserver for parallelism, and also put a load limit
# here to avoid flaky OOM, https://www.gnu.org/software/make/manual/html_node/Parallel.html
set(MAKE_COMMAND "$(MAKE)" "-l${MAX_JOBS}")
else()
# Parallel build with CPU load limit to avoid oversubscription
set(MAKE_COMMAND "make" "-j${MAX_JOBS}" "-l${MAX_JOBS}")
endif()
Expand Down

0 comments on commit ee2ce3f

Please sign in to comment.