We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[8d6bf97c3bf4:3039 :0:3095] Caught signal 7 (Bus error: nonexistent physical address) ==== backtrace (tid: 3095) ==== 0 0x0000000000043090 killpg() ???:0 1 0x000000000018bb41 __nss_database_lookup() ???:0 2 0x000000000007587d ncclGroupEnd() ???:0 3 0x000000000007b0ef ncclGroupEnd() ???:0 4 0x0000000000059e97 ncclGetUniqueId() ???:0 5 0x00000000000489b1 ???() /usr/lib/x86_64-linux-gnu/libnccl.so.2:0 6 0x000000000004a655 ???() /usr/lib/x86_64-linux-gnu/libnccl.so.2:0 7 0x0000000000063dcc ncclRedOpDestroy() ???:0 8 0x0000000000008609 start_thread() ???:0 9 0x000000000011f133 clone() ???:0 torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
The text was updated successfully, but these errors were encountered:
这个是因为docker run的时候默认分配的共享内存不够,只有64M。可以在docker run的时候加上--shm-size="6g"用于自定义分配更多共享内存。
Sorry, something went wrong.
No branches or pull requests
[8d6bf97c3bf4:3039 :0:3095] Caught signal 7 (Bus error: nonexistent physical address)
==== backtrace (tid: 3095) ====
0 0x0000000000043090 killpg() ???:0
1 0x000000000018bb41 __nss_database_lookup() ???:0
2 0x000000000007587d ncclGroupEnd() ???:0
3 0x000000000007b0ef ncclGroupEnd() ???:0
4 0x0000000000059e97 ncclGetUniqueId() ???:0
5 0x00000000000489b1 ???() /usr/lib/x86_64-linux-gnu/libnccl.so.2:0
6 0x000000000004a655 ???() /usr/lib/x86_64-linux-gnu/libnccl.so.2:0
7 0x0000000000063dcc ncclRedOpDestroy() ???:0
8 0x0000000000008609 start_thread() ???:0
9 0x000000000011f133 clone() ???:0
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
The text was updated successfully, but these errors were encountered: