You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have encountered a problem when using SuperLU_dist (7.2.0 and 8.2.1) in combiniation with Trilinos.
Trilinos currently does not support SuperLU_dist 9.0.0 or master, that's why I can't test these versions.
All my problems only occur when using the classic intel compiler suite, that I have to use for this project:
$ mpiicpc --version
icpc (ICC) 2021.9.0 20230302
And the problems are related to the compilation flags of SuperLU_dist. I have tried many ways of how I compile my software stack, but the situation is now, that I can compile metis, parmetis and trilinos in release mode with -O3, but as soon as I compile SuperLU_dist with -O2 or -O3 (default), my problems start.
But let me start from scratch.
I can compile SuperLU dist with default settings in Release mode (-O3) just fine. And I cann run all the tests in the TEST folder:
$ ./pdtest.sh
**-- nrhs = 1, process grid = 1 X 1, fill 2, relax 4, max-super 10
**-- nrhs = 1, process grid = 1 X 1, fill 2, relax 4, max-super 20
**-- nrhs = 1, process grid = 1 X 1, fill 2, relax 8, max-super 10
**-- nrhs = 1, process grid = 1 X 1, fill 2, relax 8, max-super 20
**-- nrhs = 1, process grid = 1 X 1, fill 6, relax 4, max-super 10
**-- nrhs = 1, process grid = 1 X 1, fill 6, relax 4, max-super 20
**-- nrhs = 1, process grid = 1 X 1, fill 6, relax 8, max-super 10
**-- nrhs = 1, process grid = 1 X 1, fill 6, relax 8, max-super 20
**-- nrhs = 1, process grid = 1 X 3, fill 2, relax 4, max-super 10
**-- nrhs = 1, process grid = 1 X 3, fill 2, relax 4, max-super 20
**-- nrhs = 1, process grid = 1 X 3, fill 2, relax 4, max-super 10
**-- nrhs = 1, process grid = 1 X 3, fill 2, relax 4, max-super 20
**-- nrhs = 1, process grid = 1 X 3, fill 2, relax 8, max-super 10
**-- nrhs = 1, process grid = 1 X 3, fill 2, relax 8, max-super 20
**-- nrhs = 1, process grid = 1 X 3, fill 6, relax 4, max-super 10
**-- nrhs = 1, process grid = 1 X 3, fill 6, relax 4, max-super 20
**-- nrhs = 1, process grid = 1 X 3, fill 6, relax 8, max-super 10
**-- nrhs = 1, process grid = 1 X 3, fill 6, relax 8, max-super 20
**-- nrhs = 1, process grid = 2 X 1, fill 2, relax 4, max-super 10
**-- nrhs = 1, process grid = 2 X 1, fill 2, relax 4, max-super 20
**-- nrhs = 1, process grid = 2 X 1, fill 2, relax 8, max-super 10
**-- nrhs = 1, process grid = 2 X 1, fill 2, relax 8, max-super 20
**-- nrhs = 1, process grid = 2 X 1, fill 6, relax 4, max-super 10
**-- nrhs = 1, process grid = 2 X 1, fill 6, relax 4, max-super 20
**-- nrhs = 1, process grid = 2 X 1, fill 6, relax 8, max-super 10
**-- nrhs = 1, process grid = 2 X 1, fill 6, relax 8, max-super 20
**-- nrhs = 1, process grid = 2 X 3, fill 2, relax 4, max-super 10
**-- nrhs = 1, process grid = 2 X 3, fill 2, relax 4, max-super 20
**-- nrhs = 1, process grid = 2 X 3, fill 2, relax 8, max-super 10
**-- nrhs = 1, process grid = 2 X 3, fill 2, relax 8, max-super 20
**-- nrhs = 1, process grid = 2 X 3, fill 6, relax 4, max-super 10
**-- nrhs = 1, process grid = 2 X 3, fill 6, relax 4, max-super 20
**-- nrhs = 1, process grid = 2 X 3, fill 6, relax 8, max-super 10
**-- nrhs = 1, process grid = 2 X 3, fill 6, relax 8, max-super 20
**-- nrhs = 3, process grid = 1 X 1, fill 2, relax 4, max-super 10
**-- nrhs = 3, process grid = 1 X 1, fill 2, relax 4, max-super 20
**-- nrhs = 3, process grid = 1 X 1, fill 2, relax 8, max-super 10
**-- nrhs = 3, process grid = 1 X 1, fill 2, relax 8, max-super 20
**-- nrhs = 3, process grid = 1 X 1, fill 6, relax 4, max-super 10
**-- nrhs = 3, process grid = 1 X 1, fill 6, relax 4, max-super 20
**-- nrhs = 3, process grid = 1 X 1, fill 6, relax 8, max-super 10
**-- nrhs = 3, process grid = 1 X 1, fill 6, relax 8, max-super 20
**-- nrhs = 3, process grid = 1 X 3, fill 2, relax 4, max-super 10
**-- nrhs = 3, process grid = 1 X 3, fill 2, relax 4, max-super 20
**-- nrhs = 3, process grid = 1 X 3, fill 2, relax 8, max-super 10
**-- nrhs = 3, process grid = 1 X 3, fill 2, relax 8, max-super 20
**-- nrhs = 3, process grid = 1 X 3, fill 6, relax 4, max-super 10
**-- nrhs = 3, process grid = 1 X 3, fill 6, relax 4, max-super 20
**-- nrhs = 3, process grid = 1 X 3, fill 6, relax 8, max-super 10
**-- nrhs = 3, process grid = 1 X 3, fill 6, relax 8, max-super 20
**-- nrhs = 3, process grid = 2 X 1, fill 2, relax 4, max-super 10
**-- nrhs = 3, process grid = 2 X 1, fill 2, relax 4, max-super 20
**-- nrhs = 3, process grid = 2 X 1, fill 2, relax 8, max-super 10
**-- nrhs = 3, process grid = 2 X 1, fill 2, relax 8, max-super 20
**-- nrhs = 3, process grid = 2 X 1, fill 6, relax 4, max-super 10
**-- nrhs = 3, process grid = 2 X 1, fill 6, relax 4, max-super 20
**-- nrhs = 3, process grid = 2 X 1, fill 6, relax 8, max-super 10
**-- nrhs = 3, process grid = 2 X 1, fill 6, relax 8, max-super 20
**-- nrhs = 3, process grid = 2 X 3, fill 2, relax 4, max-super 10
**-- nrhs = 3, process grid = 2 X 3, fill 2, relax 4, max-super 20
**-- nrhs = 3, process grid = 2 X 3, fill 2, relax 8, max-super 10
**-- nrhs = 3, process grid = 2 X 3, fill 2, relax 8, max-super 20
**-- nrhs = 3, process grid = 2 X 3, fill 6, relax 4, max-super 10
**-- nrhs = 3, process grid = 2 X 3, fill 6, relax 4, max-super 20
**-- nrhs = 3, process grid = 2 X 3, fill 6, relax 8, max-super 10
**-- nrhs = 3, process grid = 2 X 3, fill 6, relax 8, max-super 20
Later I want to use SuperLU_dist in combination with Trilinos. Here is a minimal working example, that depends on Trilinos
To use it, you need to adjust CMakeLists.txt and set the path to a Trilinos install directory.
Now the following problems occur:
If I use the code with SuperLU_dist compiled with -O3, it crashes even in serial:
$ mpirun -n 1 ./trilinos_superlu_dist
Setup Smoother (MueLu::Amesos2Smoother{type = Superludist})
PARMETIS ERROR adjncy is NULL.
corrupted size vs. prev_size
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 1355148 RUNNING AT knecht
= KILLED BY SIGNAL: 6 (Aborted)
===================================================================================
If I compile SuperLU_dist with only -O2, situation is slightly better. It runs perfectly fine with up to 8 processes, so:
$ mpirun -n 8 ./trilinos_superlu_dist
[...]
End Result: TEST PASSED
but if I set the process numbers higher it crashes:
$ mpirun -n 10 ./trilinos_superlu_dist
Total number of processes: 10
Setup Smoother (MueLu::Amesos2Smoother{type = Superludist})
PARMETIS ERROR adjncy is NULL.
PARMETIS ERROR adjncy is NULL.
PARMETIS ERROR adjncy is NULL.
PARMETIS ERROR adjncy is NULL.
PARMETIS ERROR adjncy is NULL.
PARMETIS ERROR adjncy is NULL.
malloc(): corrupted top size
malloc(): corrupted top size
malloc(): corrupted top size
[knecht:1386940:0:1386940] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x2c000000)
[knecht:1386941:0:1386941] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x5fffffffe)
malloc(): corrupted top size
Fatal glibc error: malloc.c:2599 (sysmalloc): assertion failed: (old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 1386939 RUNNING AT knecht
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 1386940 RUNNING AT knecht
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 2 PID 1386941 RUNNING AT knecht
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 3 PID 1386942 RUNNING AT knecht
= KILLED BY SIGNAL: 6 (Aborted)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 4 PID 1386943 RUNNING AT knecht
= KILLED BY SIGNAL: 6 (Aborted)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 5 PID 1386944 RUNNING AT knecht
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 6 PID 1386945 RUNNING AT knecht
= KILLED BY SIGNAL: 6 (Aborted)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 7 PID 1386946 RUNNING AT knecht
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 8 PID 1386947 RUNNING AT knecht
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 9 PID 1386948 RUNNING AT knecht
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
This is even the case, if I set the problem size much larger.
3.) If I compile SuperLU_dist with only -O1, everything works as expected, even for many MPI processes, e.g. 80:
$ mpirun -n 80 ./trilinos_superlu_dist
[...]
End Result: TEST PASSED
As I have already written, the problem only occurs with the Intel compiler. With GCC 14.1.1 and OpenMPI 5.0.5 everything works as expected (even with SuperLU_dist in -O3 mode).
At first I have opened a bug report at Trilinos.
but since the problem is only related to SuperLU_dist compilation flags I closed this and opened a report here.
As I have written there already: I had a similiar issue with the intel compiler just recently and have a minimal working example for this:
compiling this with icc -O3 intel_O3_bug.c -o intel_O3_bug
and running the executable: ./intel_O3_bug
results in the "wrong" result. Changing to "-O2" fixes the issue. If one uses GCC (even with -Ofast) one alwasy gets the correct result. So intel compiler (icc (ICC) 2021.9.0 20230302) is doing some nasty things here. But I don't know if the problem that I report here is related to this.
I don't know if there is anything to do on the SuperLU_dist side, or if only the Intel developers are to blame for this.
Maybe someone can also give the Intel compiler a test and try to reproduce the issue, so that maybe other default optimization flags for the Intel compiler (in specific versions) can be used?
Many greetings
mathse
The text was updated successfully, but these errors were encountered:
Hello,
I have encountered a problem when using SuperLU_dist (7.2.0 and 8.2.1) in combiniation with Trilinos.
Trilinos currently does not support SuperLU_dist 9.0.0 or master, that's why I can't test these versions.
All my problems only occur when using the classic intel compiler suite, that I have to use for this project:
And the problems are related to the compilation flags of SuperLU_dist. I have tried many ways of how I compile my software stack, but the situation is now, that I can compile metis, parmetis and trilinos in release mode with -O3, but as soon as I compile SuperLU_dist with -O2 or -O3 (default), my problems start.
But let me start from scratch.
I can compile SuperLU dist with default settings in Release mode (-O3) just fine. And I cann run all the tests in the TEST folder:
Later I want to use SuperLU_dist in combination with Trilinos. Here is a minimal working example, that depends on Trilinos
trilinos_superlu_dist.zip
To use it, you need to adjust CMakeLists.txt and set the path to a Trilinos install directory.
Now the following problems occur:
but if I set the process numbers higher it crashes:
This is even the case, if I set the problem size much larger.
3.) If I compile SuperLU_dist with only -O1, everything works as expected, even for many MPI processes, e.g. 80:
As I have already written, the problem only occurs with the Intel compiler. With GCC 14.1.1 and OpenMPI 5.0.5 everything works as expected (even with SuperLU_dist in -O3 mode).
At first I have opened a bug report at Trilinos.
but since the problem is only related to SuperLU_dist compilation flags I closed this and opened a report here.
As I have written there already: I had a similiar issue with the intel compiler just recently and have a minimal working example for this:
intel_O3_bug.zip
compiling this with
icc -O3 intel_O3_bug.c -o intel_O3_bug
and running the executable:
./intel_O3_bug
results in the "wrong" result. Changing to "-O2" fixes the issue. If one uses GCC (even with -Ofast) one alwasy gets the correct result. So intel compiler (icc (ICC) 2021.9.0 20230302) is doing some nasty things here. But I don't know if the problem that I report here is related to this.
I don't know if there is anything to do on the SuperLU_dist side, or if only the Intel developers are to blame for this.
Maybe someone can also give the Intel compiler a test and try to reproduce the issue, so that maybe other default optimization flags for the Intel compiler (in specific versions) can be used?
Many greetings
mathse
The text was updated successfully, but these errors were encountered: