Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/cxip: low device performance. #10822

Open
amirshehataornl opened this issue Feb 23, 2025 · 0 comments
Open

prov/cxip: low device performance. #10822

amirshehataornl opened this issue Feb 23, 2025 · 0 comments
Assignees

Comments

@amirshehataornl
Copy link
Contributor

amirshehataornl commented Feb 23, 2025

Describe the bug
Unexpected low performance with osu_alltoall and ROCM GPU buffers on frontier like system.

I'm seeing the below performance with the following environment variables set:

export FI_CXI_RDZV_THRESHOLD=16384
export FI_CXI_RDZV_EAGER_SIZE=2048
export FI_CXI_OFLOW_BUF_SIZE=12582912
export FI_CXI_OFLOW_BUF_COUNT=3
export FI_CXI_DEFAULT_CQ_SIZE=131072
export FI_CXI_REQ_BUF_MAX_CACHED=0
export FI_CXI_REQ_BUF_MIN_POSTED=6
export FI_CXI_REQ_BUF_SIZE=12582912
export FI_CXI_RX_MATCH_MODE=software
export FI_MR_CACHE_MAX_SIZE=-1
export FI_MR_CACHE_MAX_COUNT=524288
1                      69.69
2                      69.73
4                     686.55
8                     684.56
16                    685.98
32                    687.88
64                    690.14
128                   696.51
256                   700.48
512                    60.52
1024                   61.01
2048                   62.00
4096                   62.81
8192                 5429.99
16384                 280.84
32768                 100.29
65536                 152.72
131072                258.75
262144                509.10
524288                996.54
1048576              2052.14

comparatively, I'm seeing the following with system buffers:

1                      40.33
2                      40.22
4                      38.57
8                      38.85
16                     39.46
32                     43.24
64                     43.58
128                    47.72
256                    45.92
512                    41.34
1024                   41.57
2048                   43.83
4096                   46.32
8192                   48.40
16384                  67.62
32768                  93.77
65536                 150.25
131072                267.46
262144                583.49
524288               1145.67
1048576              2522.32

This is using main branch of open MPI and libfabric. Is there an explanation for the lower than expected performance numbers. @iziemba

@amirshehataornl amirshehataornl changed the title prov/cxip: cq peer support is broken prov/cxip: low device performance. Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants