Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests aborted when mpich is used #182

Open
sagitter opened this issue Jan 15, 2025 · 0 comments
Open

Tests aborted when mpich is used #182

sagitter opened this issue Jan 15, 2025 · 0 comments

Comments

@sagitter
Copy link

Hi all.

superlu_dist-9.1.0 tests are failing with MPICH-4.2.2:

+ /usr/bin/ctest --test-dir redhat-linux-build --output-on-failure --force-new-ctest-process -j2 --test-dir build/mpich
Internal ctest changing into directory: /builddir/build/BUILD/superlu_dist-9.1.0-build/superlu_dist-9.1.0/build/mpich
Test project /builddir/build/BUILD/superlu_dist-9.1.0-build/superlu_dist-9.1.0/build/mpich
      Start  1: pdtest_1x1_1_2_8_20_SP
      Start  2: pdtest_1x1_3_2_8_20_SP
 1/11 Test  #1: pdtest_1x1_1_2_8_20_SP ...........***Failed    0.08 sec
Abort(339324936) on node 0: Fatal error in internal_Group_incl: Invalid group, error stack:
internal_Group_incl(45487): MPI_Group_incl(group=0x8c000004, n=1, ranks=0x2aa3d1b22e0, newgroup=0x3ffcbf78250) failed
internal_Group_incl(45433): Invalid group
      Start  3: pdtest_1x2_1_2_8_20_SP
 2/11 Test  #2: pdtest_1x1_3_2_8_20_SP ...........***Failed    0.08 sec
Abort(205107208) on node 0: Fatal error in internal_Group_incl: Invalid group, error stack:
internal_Group_incl(45487): MPI_Group_incl(group=0x8c000004, n=1, ranks=0x2aa3d3572e0, newgroup=0x3ffc2778280) failed
internal_Group_incl(45433): Invalid group
      Start  4: pdtest_1x2_3_2_8_20_SP
 3/11 Test  #3: pdtest_1x2_1_2_8_20_SP ...........   Passed    0.21 sec
      Start  5: pdtest_2x1_1_2_8_20_SP
 4/11 Test  #4: pdtest_1x2_3_2_8_20_SP ...........***Failed    0.21 sec
Abort(3780616) on node 0: Fatal error in internal_Group_incl: Invalid group, error stack:
internal_Group_incl(45487): MPI_Group_incl(group=0x8c000004, n=2, ranks=0x2aa372bf8b0, newgroup=0x3ffcde77f00) failed
internal_Group_incl(45433): Invalid group
Abort(741978120) on node 1: Fatal error in internal_Group_incl: Invalid group, error stack:
internal_Group_incl(45487): MPI_Group_incl(group=0x8c000004, n=2, ranks=0x2aa321c0cc0, newgroup=0x3ffda1f7b50) failed
internal_Group_incl(45433): Invalid group
      Start  6: pdtest_2x1_3_2_8_20_SP
 5/11 Test  #5: pdtest_2x1_1_2_8_20_SP ...........***Failed    0.21 sec
Abort(809086984) on node 0: Fatal error in internal_Group_incl: Invalid group, error stack:
internal_Group_incl(45487): MPI_Group_incl(group=0x8c000004, n=2, ranks=0x2aa3a7798b0, newgroup=0x3ffe3777ce0) failed
internal_Group_incl(45433): Invalid group
Abort(741978120) on node 1: Fatal error in internal_Group_incl: Invalid group, error stack:
internal_Group_incl(45487): MPI_Group_incl(group=0x8c000004, n=2, ranks=0x2aa2e876cc0, newgroup=0x3fffe5f8090) failed
internal_Group_incl(45433): Invalid group
      Start  7: pdtest_2x2_1_2_8_20_SP
 6/11 Test  #6: pdtest_2x1_3_2_8_20_SP ...........   Passed    0.29 sec
      Start  8: pdtest_2x2_3_2_8_20_SP
 7/11 Test  #7: pdtest_2x2_1_2_8_20_SP ...........***Failed    1.14 sec
Abort(674869256) on node 0: Fatal error in internal_Group_incl: Invalid group, error stack:
internal_Group_incl(45487): MPI_Group_incl(group=0x8c000004, n=4, ranks=0x2aa193b3870, newgroup=0x3ffddcf78e0) failed
internal_Group_incl(45433): Invalid group
Abort(540651528) on node 1: Fatal error in internal_Group_incl: Invalid group, error stack:
internal_Group_incl(45487): MPI_Group_incl(group=0x8c000004, n=4, ranks=0x2aa2c9febe0, newgroup=0x3fffde77f10) failed
internal_Group_incl(45433): Invalid group
===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 16897 RUNNING AT 2766e02049ba48c2b8717ba4568488e3
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
      Start  9: pddrive1
 8/11 Test  #8: pdtest_2x2_3_2_8_20_SP ...........***Failed    1.13 sec
Abort(943304712) on node 0: Fatal error in internal_Group_incl: Invalid group, error stack:
internal_Group_incl(45487): MPI_Group_incl(group=0x8c000004, n=4, ranks=0x2aa13e20870, newgroup=0x3ffdb077ff0) failed
internal_Group_incl(45433): Invalid group
Abort(3780616) on node 1: Fatal error in internal_Group_incl: Invalid group, error stack:
internal_Group_incl(45487): MPI_Group_incl(group=0x8c000004, n=4, ranks=0x2aa04973be0, newgroup=0x3ffe06f8320) failed
internal_Group_incl(45433): Invalid group
      Start 10: pddrive2
 9/11 Test  #9: pddrive1 .........................***Failed    1.13 sec
Abort(272216072) on node 0: Fatal error in internal_Group_incl: Invalid group, error stack:
internal_Group_incl(45487): MPI_Group_incl(group=0x8c000004, n=4, ranks=0x2aa38c982c0, newgroup=0x3ffe0df81d0) failed
internal_Group_incl(45433): Invalid group
Abort(339324936) on node 1: Fatal error in internal_Group_incl: Invalid group, error stack:
internal_Group_incl(45487): MPI_Group_incl(group=0x8c000004, n=4, ranks=0x2aa18130630, newgroup=0x3ffeeef8680) failed
internal_Group_incl(45433): Invalid group
      Start 11: pddrive3
10/11 Test #10: pddrive2 .........................***Failed    1.05 sec
Abort(607760392) on node 0: Fatal error in internal_Group_incl: Invalid group, error stack:
internal_Group_incl(45487): MPI_Group_incl(group=0x8c000004, n=4, ranks=0x2aa03e1e2c0, newgroup=0x3ffe3977f58) failed
internal_Group_incl(45433): Invalid group
===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 16932 RUNNING AT 2766e02049ba48c2b8717ba4568488e3
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
11/11 Test #11: pddrive3 .........................***Failed    0.50 sec
Abort(876195848) on node 0: Fatal error in internal_Group_incl: Invalid group, error stack:
internal_Group_incl(45487): MPI_Group_incl(group=0x8c000004, n=4, ranks=0x2aa2f8ee2c0, newgroup=0x3ffea377e68) failed
internal_Group_incl(45433): Invalid group
Abort(473542664) on node 1: Fatal error in internal_Group_incl: Invalid group, error stack:
internal_Group_incl(45487): MPI_Group_incl(group=0x8c000004, n=4, ranks=0x2aa3b647630, newgroup=0x3ffff977b78) failed
internal_Group_incl(45433): Invalid group
18% tests passed, 9 tests failed out of 11
Total Test time (real) =   3.27 sec
The following tests FAILED:
	  1 - pdtest_1x1_1_2_8_20_SP (Failed)
	  2 - pdtest_1x1_3_2_8_20_SP (Failed)
	  4 - pdtest_1x2_3_2_8_20_SP (Failed)
	  5 - pdtest_2x1_1_2_8_20_SP (Failed)
	  7 - pdtest_2x2_1_2_8_20_SP (Failed)
	  8 - pdtest_2x2_3_2_8_20_SP (Failed)
	  9 - pddrive1 (Failed)
	 10 - pddrive2 (Failed)
	 11 - pddrive3 (Failed)

It's compiled with this configuration:

+ /usr/bin/cmake -S . -B redhat-linux-build -DCMAKE_C_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_CXX_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_Fortran_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_INSTALL_DO_STRIP:BOOL=OFF -DCMAKE_INSTALL_PREFIX:PATH=/usr -DINCLUDE_INSTALL_DIR:PATH=/usr/include -DLIB_INSTALL_DIR:PATH=/usr/lib64 -DSYSCONF_INSTALL_DIR:PATH=/etc -DSHARE_INSTALL_PREFIX:PATH=/usr/share -DLIB_SUFFIX=64 -DBUILD_SHARED_LIBS:BOOL=ON -B build/mpich -DCMAKE_BUILD_TYPE:STRING=Release -DBUILD_STATIC_LIBS:BOOL=FALSE -DMPIEXEC_EXECUTABLE:FILEPATH=/usr/lib64/mpich/bin/mpiexec -DTPL_ENABLE_COLAMD=ON -DTPL_COLAMD_INCLUDE_DIRS:PATH=/usr/include/suitesparse -DTPL_COLAMD_LIBRARIES:STRING=/usr/lib64/libcolamd.so '-DMPI_C_LINK_FLAGS:STRING=-L/usr/lib64/mpich/lib -lptscotch -lptscotcherr -lptscotcherrexit -L/usr/lib64 -lmetis -lscotch -lcolamd' -DTPL_BLAS_LIBRARIES:BOOL=ON -DTPL_BLAS_LIBRARIES:FILEPATH=/usr/lib64/libflexiblas.so -DTPL_ENABLE_LAPACKLIB:BOOL=OFF -DTPL_LAPACK_LIBRARIES:BOOL=OFF '-DMPI_C_HEADER_DIR:PATH=/usr/include/mpich-s390x -I/usr/include/metis.h' '-DMPI_CXX_LINK_FLAGS:STRING=-L/usr/lib64/mpich/lib -lptscotch -lptscotcherr -lptscotcherrexit -L/usr/lib64 -lmetis -lscotch' -DTPL_ENABLE_PARMETISLIB:BOOL=ON -DTPL_PARMETIS_INCLUDE_DIRS:PATH=/usr/include/mpich-s390x/scotch '-DTPL_PARMETIS_LIBRARIES:STRING=/usr/lib64/mpich/lib/libptscotchparmetis.so;/usr/lib64/libmetis.so' -DXSDK_INDEX_SIZE=32 -Denable_python:BOOL=ON -Denable_openmp:BOOL=ON -Denable_single:BOOL=OFF -Denable_double:BOOL=ON -Denable_complex16:BOOL=OFF -Denable_examples:BOOL=ON -Denable_tests:BOOL=ON -DBUILD_TESTING:BOOL=ON -DCMAKE_INSTALL_PREFIX:PATH=/usr -DCMAKE_INSTALL_BINDIR:PATH=/usr/lib64/mpich/bin -DCMAKE_INSTALL_INCLUDEDIR:PATH=/usr/include/mpich-s390x/superlu_dist -DCMAKE_INSTALL_LIBDIR:PATH=/usr/lib64/mpich/lib -DTPL_ENABLE_INTERNAL_BLASLIB:BOOL=OFF -DCMAKE_SKIP_INSTALL_RPATH:BOOL=ON
-- The C compiler identification is GNU 15.0.1
-- The CXX compiler identification is GNU 15.0.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/lib64/mpich/bin/mpicc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/lib64/mpich/bin/mpic++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The Fortran compiler identification is GNU 15.0.1
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /usr/lib64/mpich/bin/mpifort - skipped
-- SuperLU_DIST will be built as a shared library.
-- Found MPI_C: /usr/lib64/mpich/bin/mpicc (found version "4.1")
-- Found MPI_CXX: /usr/lib64/mpich/bin/mpic++ (found version "4.1")
-- Found MPI_Fortran: /usr/lib64/mpich/bin/mpifort (found version "4.1")
-- Found MPI: TRUE (found version "4.1")
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP_Fortran: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- OpenMP_EXE_LINKER_FLAGS=''
-- CMAKE_EXE_LINKER_FLAGS='-Wl,-z,relro    -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes  -L/usr/lib64/mpich/lib -lptscotch -lptscotcherr -lptscotcherrexit -Wl,-rpath,/usr/SRC -L/usr/lib64/mpich/lib -lptscotch -lptscotcherr -lptscotcherrexit -L/usr/lib64 -lmetis -lscotch -lcolamd  '
-- Using TPL_BLAS_LIBRARIES='/usr/lib64/libflexiblas.so'
-- Will not link with LAPACK.
-- Will not link with MAGMA.
-- Enabled support for PARMETIS.
-- Enabled support for COLAMD.
-- Will not link with CombBLAS.
-- EXTRA_LIB_EXPORT='-lgcc_s -lgcc -lc -lgcc_s -lgcc -lmpi -lptscotcherrexit -lptscotcherr -lptscotch '
-- EXTRA_FLIB_EXPORT=' -lptscotch -lptscotcherr -lptscotcherrexit -lmpifort -lmpi -lgfortran -lm -lgcc_s -lgcc -lm -lc -lgcc_s -lgcc'
-- superlu_dist_fortran will be built as a dynamic library.
-- Detecting Fortran/C Interface
-- Detecting Fortran/C Interface - Found GLOBAL and MODULE mangling
-- Verifying Fortran/CXX Compiler Compatibility
-- Verifying Fortran/CXX Compiler Compatibility - Success
-- Configuring done (6.9s)
-- Generating done (0.1s)

It's compiled with same configuration and openmpi-5.0.6 and all tests are successfully passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant