Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update toolchains on tioga, lassen, ruby and poodle #1712

Merged
merged 24 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
6954818
Update rocm and cce versions for both corona and tioga, updates of la…
adrienbernede Aug 6, 2024
c7d4d4c
From RSC: Fix: add missing compilers and corresponding external packages
adrienbernede Aug 6, 2024
0b5e471
From RSC: Deactivate rocm 5.7 job on tioga
adrienbernede Aug 6, 2024
c1170a1
From RSC: Fix: need to point at compiler wrapper with cuda 11.8 defin…
adrienbernede Aug 6, 2024
b6e3645
From RSC: Fix: use wrapper with cuda 11.8 consistently + change in na…
adrienbernede Aug 7, 2024
919ea3b
Do not allow [email protected] jobs to fail on ruby and poodle
adrienbernede Aug 7, 2024
bb35a59
Merge branch 'develop' into woptim/rsc-update
adrienbernede Aug 7, 2024
b1386d9
From RSC: Add cuda to xl spec relying on LC wrapper with cuda
adrienbernede Aug 7, 2024
949414a
From RSC: Fix
adrienbernede Aug 7, 2024
b6357a5
From RSC: Clean drop of rocm 5.7.0 in favor on 5.7.1 on corona
adrienbernede Aug 7, 2024
b3b7821
Merge branch 'develop' into woptim/rsc-update
adrienbernede Aug 9, 2024
0260d5d
From RSC: Update cray-mpich and add rocm 6.2.0: only apply cray-mpich…
adrienbernede Aug 9, 2024
81cc328
Update rocm in tioga CI
adrienbernede Aug 9, 2024
d9fa65e
From RSC: Enforce coherency between rocm software stack and compiler …
adrienbernede Aug 9, 2024
9865c3d
From RSC: Fix typo: rocm compiler is rocmcc
adrienbernede Aug 15, 2024
f698da0
Merge branch 'develop' into woptim/rsc-update
rhornung67 Aug 19, 2024
9472f24
Merge branch 'develop' into woptim/rsc-update
rhornung67 Aug 29, 2024
9161b2b
Allow failure for intel jobs on ruby and poodle and cce 18 jobs on ti…
adrienbernede Sep 2, 2024
6b30e1c
Merge branch 'develop' into woptim/rsc-update
rhornung67 Sep 3, 2024
0c62ada
From RSC: Remove XL jobs from shared CI jobs
adrienbernede Sep 4, 2024
2a3ce5c
Remove XL jobs defined locally too
adrienbernede Sep 4, 2024
ad48eb6
Point at main branch in RSC
adrienbernede Sep 5, 2024
ac40ebd
Do not enforce blt@develop anymore
adrienbernede Sep 5, 2024
392072d
Merge branch 'develop' into woptim/rsc-update
adrienbernede Sep 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .gitlab/custom-jobs-and-variables.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ variables:
# Project specific variants for ruby
PROJECT_RUBY_VARIANTS: "~shared +openmp +vectorization +tests"
# Project specific deps for ruby
PROJECT_RUBY_DEPS: "^blt@develop "
PROJECT_RUBY_DEPS:

# Poodle
# Arguments for top level allocation
Expand All @@ -31,7 +31,7 @@ variables:
# Project specific variants for poodle
PROJECT_POODLE_VARIANTS: "~shared +openmp +vectorization +tests"
# Project specific deps for poodle
PROJECT_POODLE_DEPS: "^blt@develop "
PROJECT_POODLE_DEPS:

# Corona
# Arguments for top level allocation
Expand All @@ -41,7 +41,7 @@ variables:
# Project specific variants for corona
PROJECT_CORONA_VARIANTS: "~shared ~openmp +vectorization +tests"
# Project specific deps for corona
PROJECT_CORONA_DEPS: "^blt@develop "
PROJECT_CORONA_DEPS:

# Tioga
# Arguments for top level allocation
Expand All @@ -51,7 +51,7 @@ variables:
# Project specific variants for corona
PROJECT_TIOGA_VARIANTS: "~shared +openmp +vectorization +tests"
# Project specific deps for corona
PROJECT_TIOGA_DEPS: "^blt@develop "
PROJECT_TIOGA_DEPS:

# Lassen and Butte use a different job scheduler (spectrum lsf) that does not
# allow pre-allocation the same way slurm does.
Expand All @@ -60,7 +60,7 @@ variables:
# Project specific variants for lassen
PROJECT_LASSEN_VARIANTS: "~shared +openmp +vectorization +tests cuda_arch=70"
# Project specific deps for lassen
PROJECT_LASSEN_DEPS: "^blt@develop "
PROJECT_LASSEN_DEPS:

# Configuration shared by build and test jobs specific to this project.
# Not all configuration can be shared. Here projects can fine tune the
Expand Down
6 changes: 3 additions & 3 deletions .gitlab/jobs/corona.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,14 @@
# ${PROJECT_<MACHINE>_DEPS} in the extra jobs. There is no reason not to fully
# describe the spec here.

rocmcc_5_7_0_hip_desul_atomics:
rocmcc_5_7_1_hip_desul_atomics:
variables:
SPEC: " ~shared +rocm ~openmp +tests +desul amdgpu_target=gfx906 %rocmcc@=5.7.0 ^[email protected].0 ^blt@develop"
SPEC: " ~shared +rocm ~openmp +tests +desul amdgpu_target=gfx906 %rocmcc@=5.7.1 ^[email protected].1"
extends: .job_on_corona

clang_19_0_0_sycl_gcc_10_3_1_rocmcc_5_7_1_hip:
variables:
SPEC: " ~shared +sycl ~openmp +tests %clang@=19.0.0 cxxflags==\"-w -fsycl -fsycl-unnamed-lambda -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend --offload-arch=gfx906\" ^blt@develop"
SPEC: " ~shared +sycl ~openmp +tests %clang@=19.0.0 cxxflags==\"-w -fsycl -fsycl-unnamed-lambda -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend --offload-arch=gfx906\""
MODULE_LIST: "rocm/5.7.1"
extends: .job_on_corona

27 changes: 10 additions & 17 deletions .gitlab/jobs/lassen.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,7 @@
# project. We keep ${PROJECT_<MACHINE>_VARIANTS} and ${PROJECT_<MACHINE>_DEPS}
# So that the comparison with the original job is easier.

# Overriding shared spec: Longer allocation + extra flags
# Warning: allowed to fail because of a bug in Spack > 0.20.3
xl_2022_08_19_gcc_8_3_1_cuda_11_2_0:
variables:
SPEC: "${PROJECT_LASSEN_VARIANTS} +cuda cxxflags==\"-qthreaded -std=c++14 -O3 -qstrict -qxlcompatmacros -qlanglvl=extended0x -qalias=noansi -qhot -qpic -qsmp=omp -qsuppress=1500-029 -qsuppress=1500-036\" %xl@=16.1.1.12.gcc.8.3.1 ^[email protected]+allow-unsupported-compilers ${PROJECT_LASSEN_DEPS} ^blt@develop"
MODULE_LIST: "cuda/11.2.0"
LASSEN_JOB_ALLOC: "1 -W 60 -q pci"
extends: .job_on_lassen
allow_failure: true
# No overridden jobs so far.

############
# Extra jobs
Expand All @@ -36,14 +28,14 @@ xl_2022_08_19_gcc_8_3_1_cuda_11_2_0:

gcc_8_3_1_omptask:
variables:
SPEC: " ~shared +openmp +omptask +tests %gcc@=8.3.1 ^blt@develop"
SPEC: " ~shared +openmp +omptask +tests %gcc@=8.3.1 ${PROJECT_LASSEN_DEPS}"
extends: .job_on_lassen

gcc_8_3_1_cuda_11_5_0_ats_disabled:
gcc_8_3_1_cuda_11_7_0_ats_disabled:
extends: .job_on_lassen
variables:
SPEC: " ~shared +openmp +tests +cuda %gcc@=8.3.1 cuda_arch=70 ^cuda@11.5.0+allow-unsupported-compilers ^blt@develop"
MODULE_LIST: "cuda/11.5.0"
SPEC: " ~shared +openmp +tests +cuda %gcc@=8.3.1 cuda_arch=70 ^cuda@11.7.0+allow-unsupported-compilers ${PROJECT_LASSEN_DEPS}"
MODULE_LIST: "cuda/11.7.0"
LASSEN_JOB_ALLOC: "1 --atsdisable -W 30 -q pci"

##########
Expand All @@ -52,7 +44,7 @@ gcc_8_3_1_cuda_11_5_0_ats_disabled:

clang_13_0_1_libcpp:
variables:
SPEC: " ~shared +openmp +tests %clang@=13.0.1 cflags==\"-DGTEST_HAS_CXXABI_H_=0\" cxxflags==\"-stdlib=libc++ -DGTEST_HAS_CXXABI_H_=0\" ^blt@develop"
SPEC: " ~shared +openmp +tests %clang@=13.0.1 cflags==\"-DGTEST_HAS_CXXABI_H_=0\" cxxflags==\"-stdlib=libc++ -DGTEST_HAS_CXXABI_H_=0\""
extends: .job_on_lassen

#clang_14_0_5_asan:
Expand All @@ -62,16 +54,17 @@ clang_13_0_1_libcpp:
# LSAN_OPTIONS: "suppressions=${CI_PROJECT_DIR}/suppressions.asan"
# extends: .job_on_lassen

gcc_8_3_1_cuda_10_1_243_desul_atomics:
gcc_8_3_1_cuda_11_7_desul_atomics:
variables:
SPEC: " ~shared +openmp +tests +cuda +desul %gcc@=8.3.1 cuda_arch=70 ^[email protected]+allow-unsupported-compilers ^blt@develop"
SPEC: " ~shared +openmp +tests +cuda +desul %gcc@=8.3.1 cuda_arch=70 ^[email protected]+allow-unsupported-compilers"
MODULE_LIST: "cuda/11.7.0"
extends: .job_on_lassen

# Warning: Allowed to fail temporarily
# Deactivated due to issues with OpenMP Target and various tests and compilers.
clang_16_0_6_ibm_omptarget:
variables:
SPEC: " ~shared +openmp +omptarget +tests %clang@=16.0.6.ibm.gcc.8.3.1 ^blt@develop"
SPEC: " ~shared +openmp +omptarget +tests %clang@=16.0.6.ibm.gcc.8.3.1"
ON_LASSEN: "OFF"
extends: .job_on_lassen
allow_failure: true
Expand Down
29 changes: 11 additions & 18 deletions .gitlab/jobs/poodle.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,14 @@
# project. We keep ${PROJECT_<MACHINE>_VARIANTS} and ${PROJECT_<MACHINE>_DEPS}
# when possible so that the comparison with the original job is easier.

# Known issue currently under investigation
# https://github.com/LLNL/RAJA/pull/1712#issuecomment-2292006843
intel_2023_2_1:
variables:
SPEC: "${PROJECT_POODLE_VARIANTS} %intel@=2023.2.1 ${PROJECT_POODLE_DEPS}"
extends: .job_on_poodle
allow_failure: true

# Identical to shared job, but use OpenMP tasks and no vectorization
clang_14_0_6:
variables:
Expand All @@ -29,21 +37,6 @@ gcc_10_3_1:
SPEC: " ~shared +openmp +omptask +tests %gcc@=10.3.1 ${PROJECT_POODLE_DEPS}"
extends: .job_on_poodle

# Identical to shared job, but use OpenMP tasks and no vectorization
# Deactivated (too long on poodle)
intel_19_1_2_gcc_10_3_1:
variables:
ON_POODLE: "OFF"
SPEC: " ~shared +openmp +omptask +tests %intel@=19.1.2.gcc.10.3.1 ${PROJECT_POODLE_DEPS}"
extends: .job_on_poodle

# Allowed to fail
intel_2022_1_0:
variables:
SPEC: "${PROJECT_POODLE_VARIANTS} %intel@=2022.1.0 ${PROJECT_POODLE_DEPS}"
allow_failure: true
extends: .job_on_poodle

############
# Extra jobs
############
Expand All @@ -53,16 +46,16 @@ intel_2022_1_0:

clang_14_0_6_openmp_off:
variables:
SPEC: " ~shared ~openmp +tests %clang@=14.0.6 ^blt@develop"
SPEC: " ~shared ~openmp +tests %clang@=14.0.6"
extends: .job_on_poodle

gcc_10_3_1_openmp_default:
variables:
SPEC: " ~shared +tests %gcc@=10.3.1 ^blt@develop"
SPEC: " ~shared +tests %gcc@=10.3.1"
extends: .job_on_poodle

# OTHERS
clang_14_0_6_gcc_10_3_1_desul_atomics:
variables:
SPEC: " ~shared +openmp +tests +desul %clang@=14.0.6.gcc.10.3.1 ^blt@develop"
SPEC: " ~shared +openmp +tests +desul %clang@=14.0.6.gcc.10.3.1"
extends: .job_on_poodle
19 changes: 7 additions & 12 deletions .gitlab/jobs/ruby.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,18 +29,13 @@ gcc_10_3_1:
SPEC: " ~shared +openmp +omptask +tests %gcc@=10.3.1 ${PROJECT_RUBY_DEPS}"
extends: .job_on_ruby

# Identical to shared job, but use OpenMP tasks and no vectorization
intel_19_1_2_gcc_10_3_1:
# Known issue currently under investigation
# https://github.com/LLNL/RAJA/pull/1712#issuecomment-2292006843
intel_2023_2_1:
variables:
SPEC: " ~shared +openmp +omptask +tests %intel@=19.1.2.gcc.10.3.1 ${PROJECT_RUBY_DEPS}"
SPEC: "${PROJECT_RUBY_VARIANTS} %intel@=2023.2.1 ${PROJECT_RUBY_DEPS}"
extends: .job_on_ruby

# Allowed to fail
intel_2022_1_0:
variables:
SPEC: "${PROJECT_RUBY_VARIANTS} %intel@=2022.1.0 ${PROJECT_RUBY_DEPS}"
allow_failure: true
extends: .job_on_ruby

############
# Extra jobs
Expand All @@ -51,16 +46,16 @@ intel_2022_1_0:

clang_14_0_6_openmp_off:
variables:
SPEC: " ~shared ~openmp +tests %clang@=14.0.6 ^blt@develop"
SPEC: " ~shared ~openmp +tests %clang@=14.0.6"
extends: .job_on_ruby

gcc_10_3_1_openmp_default:
variables:
SPEC: " ~shared +tests %gcc@=10.3.1 ^blt@develop"
SPEC: " ~shared +tests %gcc@=10.3.1"
extends: .job_on_ruby

# OTHERS
clang_14_0_6_gcc_10_3_1_desul_atomics:
variables:
SPEC: " ~shared +openmp +tests +desul %clang@=14.0.6.gcc.10.3.1 ^blt@develop"
SPEC: " ~shared +openmp +tests +desul %clang@=14.0.6.gcc.10.3.1"
extends: .job_on_ruby
21 changes: 16 additions & 5 deletions .gitlab/jobs/tioga.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,13 @@
# project. We keep ${PROJECT_<MACHINE>_VARIANTS} and ${PROJECT_<MACHINE>_DEPS}
# So that the comparison with the original job is easier.

# No overridden jobs so far.
# Compiler error preventing a test to succeed.
# https://github.com/LLNL/RAJA/pull/1712#issuecomment-2316335119
cce_18_0_0:
variables:
SPEC: "${PROJECT_TIOGA_VARIANTS} %cce@=18.0.0 ${PROJECT_TIOGA_DEPS}"
extends: .job_on_tioga
allow_failure: true

############
# Extra jobs
Expand All @@ -26,12 +32,17 @@
# ${PROJECT_<MACHINE>_DEPS} in the extra jobs. There is no reason not to fully
# describe the spec here.

rocmcc_6_1_1_hip_desul_atomics:
cce_17_0_1:
variables:
SPEC: "${PROJECT_TIOGA_VARIANTS} %cce@=17.0.1 ${PROJECT_TIOGA_DEPS}"
extends: .job_on_tioga

rocmcc_6_2_0_hip_desul_atomics:
variables:
SPEC: "~shared +rocm ~openmp +desul +tests amdgpu_target=gfx90a %rocmcc@=6.1.1 ^hip@6.1.1 ^blt@develop"
SPEC: "~shared +rocm ~openmp +desul +tests amdgpu_target=gfx90a %rocmcc@=6.2.0 ^hip@6.2.0"
extends: .job_on_tioga

rocmcc_6_1_1_hip_openmp:
rocmcc_6_2_0_hip_openmp:
variables:
SPEC: "~shared +rocm +openmp +omptask +tests amdgpu_target=gfx90a %rocmcc@=6.1.1 ^hip@6.1.1 ^blt@develop"
SPEC: "~shared +rocm +openmp +omptask +tests amdgpu_target=gfx90a %rocmcc@=6.2.0 ^hip@6.2.0"
extends: .job_on_tioga
2 changes: 1 addition & 1 deletion tpl/camp
Loading