Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase CUDA PR build jobs to 96 #11382

Closed

Conversation

sebrowne
Copy link
Contributor

Each CUDA PR build now has exclusive use of a 48 core GPU machine with plenty of memory, so turn the parallelism up for all of them.

User Support Ticket(s) or Story Referenced: TRILFRAME-522

I don't really like doing this in a file that says "do it for every CUDA build". I'd rather encode it into the autotester configs, but for now this seems to be where it's done.

@trilinos/framework

Motivation

Want faster CUDA PR turnaround (it and Intel are very long).

Each CUDA PR build now has exclusive use of a 48 core GPU machine with
plenty of memory, so turn the parallelism up for all of them.

User Support Ticket(s) or Story Referenced: TRILFRAME-522
@sebrowne sebrowne added system: gpu AT: AUTOMERGE Causes the PR autotester to automatically merge the PR branch once approvals are completed labels Dec 14, 2022
@sebrowne sebrowne requested review from e10harvey and a team December 14, 2022 00:05
No _cudas exist....

User Support Ticket(s) or Story Referenced: TRILFRAME-522
e10harvey
e10harvey previously approved these changes Dec 14, 2022
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: Trilinos_PR_gcc-8.3.0

  • Build Num: 1560
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-8.3.0-openmpi-1.10.1-openmp_release-debug_static_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS system: gpu;AT: AUTOMERGE
PULLREQUESTNUM 11382
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_BRANCH sebrown/increase_cuda_build_jobs
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA bace5dd
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 1a99d88

Build Information

Test Name: Trilinos_PR_gcc-8.3.0-serial

  • Build Num: 87
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-8.3.0-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_no-mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS system: gpu;AT: AUTOMERGE
PULLREQUESTNUM 11382
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_BRANCH sebrown/increase_cuda_build_jobs
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA bace5dd
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 1a99d88

Build Information

Test Name: Trilinos_PR_gcc-8.3.0-debug

  • Build Num: 86
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-8.3.0-openmpi-1.10.1-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS system: gpu;AT: AUTOMERGE
PULLREQUESTNUM 11382
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_BRANCH sebrown/increase_cuda_build_jobs
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA bace5dd
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 1a99d88

Build Information

Test Name: Trilinos_PR_clang-11.0.1

  • Build Num: 86
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-clang-11.0.1-openmpi-1.10.1-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS system: gpu;AT: AUTOMERGE
PULLREQUESTNUM 11382
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_BRANCH sebrown/increase_cuda_build_jobs
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA bace5dd
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 1a99d88

Build Information

Test Name: Trilinos_PR_python3

  • Build Num: 1341
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-7.2.0-anaconda3-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_no-mpi_no-pt_no-rdc_no-uvm_deprecated-on_pr-framework
PR_LABELS system: gpu;AT: AUTOMERGE
PULLREQUESTNUM 11382
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL ascic
TRILINOS_SOURCE_BRANCH sebrown/increase_cuda_build_jobs
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA bace5dd
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 1a99d88

Build Information

Test Name: Trilinos_PR_cuda-11.4.2-uvm-off

  • Build Num: 1087
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS system: gpu;AT: AUTOMERGE
PULLREQUESTNUM 11382
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL GPU
TRILINOS_SOURCE_BRANCH sebrown/increase_cuda_build_jobs
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA bace5dd
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 1a99d88

Using Repos:

Repo: TRILINOS (sebrowne/Trilinos)
  • Branch: sebrown/increase_cuda_build_jobs
  • SHA: bace5dd
  • Mode: TEST_REPO

Pull Request Author: sebrowne

@sebrowne sebrowne added AT: WIP Causes the PR autotester to not test the PR. (Remove to allow testing to occur.) and removed AT: AUTOMERGE Causes the PR autotester to automatically merge the PR branch once approvals are completed labels Dec 14, 2022
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Error: Jenkins Jobs - A user has commited a change to the PR before testing completed. The original testing SHA = bace5dd Does not match the current commit SHA = e3cd66c. The Jenkins Jobs will be shutdown; Testing of this PR must occur again.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED

Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run.

Pull Request Auto Testing has FAILED (click to expand)

Build Information

Test Name: Trilinos_PR_gcc-8.3.0

  • Build Num: 1560
  • Status: ERROR

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-8.3.0-openmpi-1.10.1-openmp_release-debug_static_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS system: gpu;AT: AUTOMERGE
PULLREQUESTNUM 11382
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_BRANCH sebrown/increase_cuda_build_jobs
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA bace5dd
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 1a99d88

Build Information

Test Name: Trilinos_PR_gcc-8.3.0-serial

  • Build Num: 87
  • Status: ERROR

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-8.3.0-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_no-mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS system: gpu;AT: AUTOMERGE
PULLREQUESTNUM 11382
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_BRANCH sebrown/increase_cuda_build_jobs
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA bace5dd
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 1a99d88

Build Information

Test Name: Trilinos_PR_gcc-8.3.0-debug

  • Build Num: 86
  • Status: ERROR

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-8.3.0-openmpi-1.10.1-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS system: gpu;AT: AUTOMERGE
PULLREQUESTNUM 11382
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_BRANCH sebrown/increase_cuda_build_jobs
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA bace5dd
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 1a99d88

Build Information

Test Name: Trilinos_PR_clang-11.0.1

  • Build Num: 86
  • Status: ERROR

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-clang-11.0.1-openmpi-1.10.1-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS system: gpu;AT: AUTOMERGE
PULLREQUESTNUM 11382
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_BRANCH sebrown/increase_cuda_build_jobs
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA bace5dd
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 1a99d88

Build Information

Test Name: Trilinos_PR_python3

  • Build Num: 1341
  • Status: ERROR

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-7.2.0-anaconda3-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_no-mpi_no-pt_no-rdc_no-uvm_deprecated-on_pr-framework
PR_LABELS system: gpu;AT: AUTOMERGE
PULLREQUESTNUM 11382
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL ascic
TRILINOS_SOURCE_BRANCH sebrown/increase_cuda_build_jobs
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA bace5dd
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 1a99d88

Build Information

Test Name: Trilinos_PR_cuda-11.4.2-uvm-off

  • Build Num: 1087
  • Status: ERROR

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS system: gpu;AT: AUTOMERGE
PULLREQUESTNUM 11382
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL GPU
TRILINOS_SOURCE_BRANCH sebrown/increase_cuda_build_jobs
TRILINOS_SOURCE_REPO https://github.com/sebrowne/Trilinos
TRILINOS_SOURCE_SHA bace5dd
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 1a99d88


CDash Test Results for PR# 11382.


Wiki: How to Reproduce PR Testing Builds and Errors.

LaunchDriver.py is untested in the current PR framework, since it's
using the target branch at that point instead of the source branch (or
the merged branch).  Want to eliminate as much stuff as possible from
this script.  Most stuff doesn't need to be set until later anyway.

User Support Ticket(s) or Story Referenced: TRILFRAME-528
User Support Ticket(s) or Story Referenced: N/A
Calculate and use the build cores inside of PullRequestLinuxDriver.sh,
so that the PR testing will make use of this setting and test it.  In
response, also stop using TRILINOS_MAX_CORES to make it more clear that
nothing above this script can change the build core count.

User Support Ticket(s) or Story Referenced: TRILFRAME-528
User Support Ticket(s) or Story Referenced: N/A
@sebrowne sebrowne force-pushed the sebrown/increase_cuda_build_jobs branch from e3cd66c to 4b15f48 Compare December 15, 2022 15:22
@sebrowne sebrowne closed this Dec 15, 2022
@sebrowne
Copy link
Contributor Author

Closing in lieu of #11391

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AT: WIP Causes the PR autotester to not test the PR. (Remove to allow testing to occur.) system: gpu
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants