Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with Intel classic compiler when not compiling with -O1 #168

Open
mathsen opened this issue Aug 7, 2024 · 0 comments
Open

Crash with Intel classic compiler when not compiling with -O1 #168

mathsen opened this issue Aug 7, 2024 · 0 comments

Comments

@mathsen
Copy link

mathsen commented Aug 7, 2024

Hello,

I have encountered a problem when using SuperLU_dist (7.2.0 and 8.2.1) in combiniation with Trilinos.
Trilinos currently does not support SuperLU_dist 9.0.0 or master, that's why I can't test these versions.
All my problems only occur when using the classic intel compiler suite, that I have to use for this project:

$ mpiicpc --version
icpc (ICC) 2021.9.0 20230302

And the problems are related to the compilation flags of SuperLU_dist. I have tried many ways of how I compile my software stack, but the situation is now, that I can compile metis, parmetis and trilinos in release mode with -O3, but as soon as I compile SuperLU_dist with -O2 or -O3 (default), my problems start.

But let me start from scratch.
I can compile SuperLU dist with default settings in Release mode (-O3) just fine. And I cann run all the tests in the TEST folder:

$ ./pdtest.sh
**-- nrhs = 1, process grid = 1 X 1, fill 2, relax 4, max-super 10
**-- nrhs = 1, process grid = 1 X 1, fill 2, relax 4, max-super 20
**-- nrhs = 1, process grid = 1 X 1, fill 2, relax 8, max-super 10
**-- nrhs = 1, process grid = 1 X 1, fill 2, relax 8, max-super 20
**-- nrhs = 1, process grid = 1 X 1, fill 6, relax 4, max-super 10
**-- nrhs = 1, process grid = 1 X 1, fill 6, relax 4, max-super 20
**-- nrhs = 1, process grid = 1 X 1, fill 6, relax 8, max-super 10
**-- nrhs = 1, process grid = 1 X 1, fill 6, relax 8, max-super 20
**-- nrhs = 1, process grid = 1 X 3, fill 2, relax 4, max-super 10
**-- nrhs = 1, process grid = 1 X 3, fill 2, relax 4, max-super 20
**-- nrhs = 1, process grid = 1 X 3, fill 2, relax 4, max-super 10
**-- nrhs = 1, process grid = 1 X 3, fill 2, relax 4, max-super 20
**-- nrhs = 1, process grid = 1 X 3, fill 2, relax 8, max-super 10
**-- nrhs = 1, process grid = 1 X 3, fill 2, relax 8, max-super 20
**-- nrhs = 1, process grid = 1 X 3, fill 6, relax 4, max-super 10
**-- nrhs = 1, process grid = 1 X 3, fill 6, relax 4, max-super 20
**-- nrhs = 1, process grid = 1 X 3, fill 6, relax 8, max-super 10
**-- nrhs = 1, process grid = 1 X 3, fill 6, relax 8, max-super 20
**-- nrhs = 1, process grid = 2 X 1, fill 2, relax 4, max-super 10
**-- nrhs = 1, process grid = 2 X 1, fill 2, relax 4, max-super 20
**-- nrhs = 1, process grid = 2 X 1, fill 2, relax 8, max-super 10
**-- nrhs = 1, process grid = 2 X 1, fill 2, relax 8, max-super 20
**-- nrhs = 1, process grid = 2 X 1, fill 6, relax 4, max-super 10
**-- nrhs = 1, process grid = 2 X 1, fill 6, relax 4, max-super 20
**-- nrhs = 1, process grid = 2 X 1, fill 6, relax 8, max-super 10
**-- nrhs = 1, process grid = 2 X 1, fill 6, relax 8, max-super 20
**-- nrhs = 1, process grid = 2 X 3, fill 2, relax 4, max-super 10
**-- nrhs = 1, process grid = 2 X 3, fill 2, relax 4, max-super 20
**-- nrhs = 1, process grid = 2 X 3, fill 2, relax 8, max-super 10
**-- nrhs = 1, process grid = 2 X 3, fill 2, relax 8, max-super 20
**-- nrhs = 1, process grid = 2 X 3, fill 6, relax 4, max-super 10
**-- nrhs = 1, process grid = 2 X 3, fill 6, relax 4, max-super 20
**-- nrhs = 1, process grid = 2 X 3, fill 6, relax 8, max-super 10
**-- nrhs = 1, process grid = 2 X 3, fill 6, relax 8, max-super 20
**-- nrhs = 3, process grid = 1 X 1, fill 2, relax 4, max-super 10
**-- nrhs = 3, process grid = 1 X 1, fill 2, relax 4, max-super 20
**-- nrhs = 3, process grid = 1 X 1, fill 2, relax 8, max-super 10
**-- nrhs = 3, process grid = 1 X 1, fill 2, relax 8, max-super 20
**-- nrhs = 3, process grid = 1 X 1, fill 6, relax 4, max-super 10
**-- nrhs = 3, process grid = 1 X 1, fill 6, relax 4, max-super 20
**-- nrhs = 3, process grid = 1 X 1, fill 6, relax 8, max-super 10
**-- nrhs = 3, process grid = 1 X 1, fill 6, relax 8, max-super 20
**-- nrhs = 3, process grid = 1 X 3, fill 2, relax 4, max-super 10
**-- nrhs = 3, process grid = 1 X 3, fill 2, relax 4, max-super 20
**-- nrhs = 3, process grid = 1 X 3, fill 2, relax 8, max-super 10
**-- nrhs = 3, process grid = 1 X 3, fill 2, relax 8, max-super 20
**-- nrhs = 3, process grid = 1 X 3, fill 6, relax 4, max-super 10
**-- nrhs = 3, process grid = 1 X 3, fill 6, relax 4, max-super 20
**-- nrhs = 3, process grid = 1 X 3, fill 6, relax 8, max-super 10
**-- nrhs = 3, process grid = 1 X 3, fill 6, relax 8, max-super 20
**-- nrhs = 3, process grid = 2 X 1, fill 2, relax 4, max-super 10
**-- nrhs = 3, process grid = 2 X 1, fill 2, relax 4, max-super 20
**-- nrhs = 3, process grid = 2 X 1, fill 2, relax 8, max-super 10
**-- nrhs = 3, process grid = 2 X 1, fill 2, relax 8, max-super 20
**-- nrhs = 3, process grid = 2 X 1, fill 6, relax 4, max-super 10
**-- nrhs = 3, process grid = 2 X 1, fill 6, relax 4, max-super 20
**-- nrhs = 3, process grid = 2 X 1, fill 6, relax 8, max-super 10
**-- nrhs = 3, process grid = 2 X 1, fill 6, relax 8, max-super 20
**-- nrhs = 3, process grid = 2 X 3, fill 2, relax 4, max-super 10
**-- nrhs = 3, process grid = 2 X 3, fill 2, relax 4, max-super 20
**-- nrhs = 3, process grid = 2 X 3, fill 2, relax 8, max-super 10
**-- nrhs = 3, process grid = 2 X 3, fill 2, relax 8, max-super 20
**-- nrhs = 3, process grid = 2 X 3, fill 6, relax 4, max-super 10
**-- nrhs = 3, process grid = 2 X 3, fill 6, relax 4, max-super 20
**-- nrhs = 3, process grid = 2 X 3, fill 6, relax 8, max-super 10
**-- nrhs = 3, process grid = 2 X 3, fill 6, relax 8, max-super 20

Later I want to use SuperLU_dist in combination with Trilinos. Here is a minimal working example, that depends on Trilinos

trilinos_superlu_dist.zip

To use it, you need to adjust CMakeLists.txt and set the path to a Trilinos install directory.
Now the following problems occur:

  1. If I use the code with SuperLU_dist compiled with -O3, it crashes even in serial:
$ mpirun -n 1 ./trilinos_superlu_dist

Setup Smoother (MueLu::Amesos2Smoother{type = Superludist})
PARMETIS ERROR adjncy is NULL.
corrupted size vs. prev_size

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 1355148 RUNNING AT knecht
=   KILLED BY SIGNAL: 6 (Aborted)
===================================================================================
  1. If I compile SuperLU_dist with only -O2, situation is slightly better. It runs perfectly fine with up to 8 processes, so:
$ mpirun -n 8 ./trilinos_superlu_dist

[...]
End Result: TEST PASSED

but if I set the process numbers higher it crashes:

$ mpirun -n 10 ./trilinos_superlu_dist
Total number of processes: 10
Setup Smoother (MueLu::Amesos2Smoother{type = Superludist})
PARMETIS ERROR adjncy is NULL.
PARMETIS ERROR adjncy is NULL.
PARMETIS ERROR adjncy is NULL.
PARMETIS ERROR adjncy is NULL.
PARMETIS ERROR adjncy is NULL.
PARMETIS ERROR adjncy is NULL.
malloc(): corrupted top size
malloc(): corrupted top size
malloc(): corrupted top size
[knecht:1386940:0:1386940] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x2c000000)
[knecht:1386941:0:1386941] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x5fffffffe)
malloc(): corrupted top size
Fatal glibc error: malloc.c:2599 (sysmalloc): assertion failed: (old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 1386939 RUNNING AT knecht
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 1386940 RUNNING AT knecht
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 2 PID 1386941 RUNNING AT knecht
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 3 PID 1386942 RUNNING AT knecht
=   KILLED BY SIGNAL: 6 (Aborted)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 4 PID 1386943 RUNNING AT knecht
=   KILLED BY SIGNAL: 6 (Aborted)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 5 PID 1386944 RUNNING AT knecht
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 6 PID 1386945 RUNNING AT knecht
=   KILLED BY SIGNAL: 6 (Aborted)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 7 PID 1386946 RUNNING AT knecht
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 8 PID 1386947 RUNNING AT knecht
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 9 PID 1386948 RUNNING AT knecht
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

This is even the case, if I set the problem size much larger.

3.) If I compile SuperLU_dist with only -O1, everything works as expected, even for many MPI processes, e.g. 80:

$ mpirun -n 80 ./trilinos_superlu_dist

[...]
End Result: TEST PASSED

As I have already written, the problem only occurs with the Intel compiler. With GCC 14.1.1 and OpenMPI 5.0.5 everything works as expected (even with SuperLU_dist in -O3 mode).
At first I have opened a bug report at Trilinos.

but since the problem is only related to SuperLU_dist compilation flags I closed this and opened a report here.

As I have written there already: I had a similiar issue with the intel compiler just recently and have a minimal working example for this:

intel_O3_bug.zip

compiling this with
icc -O3 intel_O3_bug.c -o intel_O3_bug

and running the executable:
./intel_O3_bug
results in the "wrong" result. Changing to "-O2" fixes the issue. If one uses GCC (even with -Ofast) one alwasy gets the correct result. So intel compiler (icc (ICC) 2021.9.0 20230302) is doing some nasty things here. But I don't know if the problem that I report here is related to this.

I don't know if there is anything to do on the SuperLU_dist side, or if only the Intel developers are to blame for this.
Maybe someone can also give the Intel compiler a test and try to reproduce the issue, so that maybe other default optimization flags for the Intel compiler (in specific versions) can be used?

Many greetings
mathse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant