You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am doing a routine build and test (for MR #638), with my tmad allTees script. This includes a make -j of all processes.
Up until now, on all machines and with all compiler combinations I had used, this had always succeeded. (The only situation where the build failed is gg_ttgggg in PR #601, but this is an experimental case that clearly needs fixing by splitting up a function into smaller function.)
Anyway, today I am getting the following errors
...
ccache /usr/local/cuda-12.0/bin/nvcc -O3 -lineinfo -I. -I../../src -I../../../../../tools -I../../../../../test/googletest/googletest/include -I../../../../../test/googletest/googletest/include -I/usr/local/cuda-12.0/include/ -DUSE_NVTX -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -use_fast_math -std=c++17 -ccbin /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/g++ -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_FLOAT -Xcompiler -fPIC -c -x cu testmisc.cc -o build.none_m_inl0_hrd0/testmisc_cu.o
ccache /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/g++ -O3 -std=c++17 -I. -I../../src -I../../../../../tools -I../../../../../test/googletest/googletest/include -I../../../../../test/googletest/googletest/include -DUSE_NVTX -Wall -Wshadow -Wextra -ffast-math -fopenmp -march=x86-64 -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_FLOAT -I/usr/local/cuda-12.0/include/ -fPIC -c runTest.cc -o build.none_m_inl0_hrd0/runTest.o
ccache /usr/local/cuda-12.0/bin/nvcc -O3 -lineinfo -I. -I../../src -I../../../../../tools -I/usr/local/cuda-12.0/include/ -DUSE_NVTX -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -use_fast_math -std=c++17 -ccbin /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/g++ -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_FLOAT -Xcompiler -fPIC -c -x cu fsampler.cc -o build.512y_m_inl0_hrd0/fsampler_cu.o
ccache /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/g++ -O3 -std=c++17 -I. -I../../src -I../../../../../tools -DUSE_NVTX -Wall -Wshadow -Wextra -ffast-math -fopenmp -march=skylake-avx512 -mprefer-vector-width=256 -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_FLOAT -I/usr/local/cuda-12.0/include/ -fPIC -c fsampler.cc -o build.512y_m_inl0_hrd0/fsampler.o
ccache /usr/local/cuda-12.0/bin/nvcc -O3 -lineinfo -I. -I../../src -I../../../../../tools -I../../../../../test/googletest/googletest/include -I../../../../../test/googletest/googletest/include -I/usr/local/cuda-12.0/include/ -DUSE_NVTX -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -use_fast_math -std=c++17 -ccbin /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/g++ -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_FLOAT -Xcompiler -fPIC -c -x cu runTest.cc -o build.none_m_inl0_hrd0/runTest_cu.o
ccache /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/gfortran -I. -c fcheck_sa.f -o build.none_m_inl0_hrd0/fcheck_sa.o
ccache /usr/local/cuda-12.0/bin/nvcc -O3 -lineinfo -I. -I../../src -I../../../../../tools -I/usr/local/cuda-12.0/include/ -DUSE_NVTX -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -use_fast_math -std=c++17 -ccbin /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/g++ -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_FLOAT -Xcompiler -fPIC -c -x cu fsampler.cc -o build.none_m_inl0_hrd0/fsampler_cu.o
ccache /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/g++ -O3 -std=c++17 -I. -I../../src -I../../../../../tools -DUSE_NVTX -Wall -Wshadow -Wextra -ffast-math -fopenmp -march=x86-64 -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_FLOAT -I/usr/local/cuda-12.0/include/ -fPIC -c fsampler.cc -o build.none_m_inl0_hrd0/fsampler.o
nvcc error : 'cudafe++' died due to signal 9 (Kill signal)
nvcc error : 'cudafe++' died due to signal 9 (Kill signal)
nvcc error : 'cudafe++' died due to signal 9 (Kill signal)
nvcc error : 'cudafe++' died due to signal 9 (Kill signal)
nvcc error : 'cudafe++' died due to signal 9 (Kill signal)
nvcc error : 'cudafe++' died due to signal 9 (Kill signal)
nvcc error : 'cudafe++' died due to signal 9 (Kill signal)
nvcc error : 'cudafe++' died due to signal 9 (Kill signal)
nvcc error : 'cudafe++' died due to signal 9 (Kill signal)
make[2]: *** [cudacpp.mk:422: build.512y_m_inl0_hrd0/gRandomNumberKernels.o] Error 9
...
This is on itscrd80 with cuda 12.0 and gcc11.2.
The only things that I can think of as being different from usual are
of course this is a new PR so the code changed slightly, but actually it is mainly the Fortran that changed not the cuda
I am using itscrd80 because I have a driver issue on itscrd90 (and in the past I was on itscrd70 usually)
I am only running tmad allTees rather than going via tput allTees first, but the builds should be exactly the same
Maybe itscrd80 is configured differently?
Anyway, note that there are 9 nvcc errors, so I guess this was a build with parallelism 9? I will try to limit it (to 5 as I have 5 AVX builds in parallel, otherwise it serializes too much...)
The text was updated successfully, but these errors were encountered:
I am doing a routine build and test (for MR #638), with my tmad allTees script. This includes a
make -j
of all processes.Up until now, on all machines and with all compiler combinations I had used, this had always succeeded. (The only situation where the build failed is gg_ttgggg in PR #601, but this is an experimental case that clearly needs fixing by splitting up a function into smaller function.)
Anyway, today I am getting the following errors
This is on itscrd80 with cuda 12.0 and gcc11.2.
The only things that I can think of as being different from usual are
Maybe itscrd80 is configured differently?
Anyway, note that there are 9 nvcc errors, so I guess this was a build with parallelism 9? I will try to limit it (to 5 as I have 5 AVX builds in parallel, otherwise it serializes too much...)
The text was updated successfully, but these errors were encountered: