-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI_Cart_sub segfault #13081
Comments
FWIW I was not able to reproduce the issue but with more recent versions of Open MPI. Out of curiosity, can you try
|
Yeah the solution probably just is to update openmpi. It would be good to know which version fixes the issue though shell$ mpirun -n 2 --mca btl ^smcuda ./segfault
[titanxp:13818] *** Process received signal ***
[titanxp:13818] Signal: Segmentation fault (11)
[titanxp:13818] Signal code: Address not mapped (1)
[titanxp:13818] Failing at address: 0x8
[titanxp:13818] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fdf4b819520]
[titanxp:13818] [ 1] /usr/local/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_match+0x995)[0x7fdf308121d5]
[titanxp:13818] [ 2] /usr/local/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x8f)[0x7fdf36c0437f]
[titanxp:13818] [ 3] /usr/local/lib/openmpi/mca_btl_vader.so(+0x46a7)[0x7fdf36c046a7]
[titanxp:13818] [ 4] /usr/local/lib/libopen-pal.so.40(opal_progress+0x2c)[0x7fdf4b036c2c]
[titanxp:13818] [ 5] /usr/local/lib/libmpi.so.40(ompi_request_default_wait+0x4d)[0x7fdf4ba4b95d]
[titanxp:13818] [ 6] /usr/local/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0xc1)[0x7fdf4baa85d1]
[titanxp:13818] [ 7] /usr/local/lib/libmpi.so.40(ompi_coll_base_allgather_intra_two_procs+0x89)[0x7fdf4baa77c9]
[titanxp:13818] [ 8] /usr/local/lib/libmpi.so.40(ompi_comm_split+0xc5)[0x7fdf4ba2eba5]
[titanxp:13818] [ 9] /usr/local/lib/libmpi.so.40(mca_topo_base_cart_sub+0xe4)[0x7fdf4bad0054]
[titanxp:13818] [10] /usr/local/lib/libmpi.so.40(PMPI_Cart_sub+0xca)[0x7fdf4ba6805a]
[titanxp:13818] [11] ./segfault(+0x1455)[0x55f545ab0455]
[titanxp:13818] [12] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fdf4b800d90]
[titanxp:13818] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fdf4b800e40]
[titanxp:13818] [14] ./segfault(+0x11a5)[0x55f545ab01a5]
[titanxp:13818] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node titanxp exited on signal 11 (Segmentation fault).
-------------------------------------------------------------------------- |
I can't reproduce with any of the official stable releases (4.1 or 5.0). Main also works. |
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
mpirun --version reports 4.0.0
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
I run Linux mint 21.3 v6.0.4 and use the operating system apt install libopenmpi-dev
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
mpic++ --version is g++ 11.4.0
Details of the problem
This program segfaults
The text was updated successfully, but these errors were encountered: