Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in OSHMEM idle contexts reuse #13060

Open
zhongchen530 opened this issue Jan 28, 2025 · 0 comments · May be fixed by #13061
Open

Issue in OSHMEM idle contexts reuse #13060

zhongchen530 opened this issue Jan 28, 2025 · 0 comments · May be fixed by #13061

Comments

@zhongchen530
Copy link

zhongchen530 commented Jan 28, 2025

In file ompi/oshmem/mca/spml/ucx/spml_ucx.c
around line 1140,

/* Check if we have an idle context to reuse */
    SHMEM_MUTEX_LOCK(mca_spml_ucx.internal_mutex);
    for (i = 0; i < idle_array->ctxs_count; i++) {
        if (idle_array->ctxs[i]->options & options) {
            ucx_ctx = idle_array->ctxs[i];
            _ctx_remove(idle_array, ucx_ctx, i);
            break;
        }
    }

We have the above code that attempts to reuse idle contexts from an array of idle contexts. However, the condition (idle_array->ctxs[i]->options & options at line 1140 suggests that an idle context with option 0 will never get reused.

If you modify the OSHMEM code to print out ctxs_count which is the length of the array idle_array, and run the following simple program,

#include <shmem.h>
int main() {
    shmem_init();
    for (int i = 0; i < 10; i++) {
        shmem_ctx_t ctx;
        shmem_ctx_create(0, &ctx);
        shmem_ctx_destroy(ctx);
    }
    shmem_finalize();
}

you will observe that ctxs_count grows from 0 to 9, indicating the idle contexts are not getting reused and the idle array explodes. I believe this is not the expected behavior and the fix would be as simple as changing the condition from idle_array->ctxs[i]->options & options to idle_array->ctxs[i]->options == options.

Moreover, the current code can lead to correctness issue as well because it can potentially assign a more restrictive context as a context with less restrictive option. Consider the following program.

#include <shmem.h>
int main() {
    shmem_init();
    shmem_ctx_t ctx1, ctx2;
    shmem_ctx_create(SHMEM_CTX_NOSTORE | SHMEM_CTX_PRIVATE, &ctx1);
    shmem_ctx_destroy(ctx1);
    shmem_ctx_create(SHMEM_CTX_NOSTORE, &ctx2);
    shmem_ctx_destroy(ctx2);
    shmem_finalize();
}

With the current code, a context configured with both SHMEM_CTX_NOSTORE and SHMEM_CTX_PRIVATE will be assigned as a context configured with only SHMEM_CTX_NOSTORE. This will lead to correctness issue as ctx2 may be shared. In fact, the current code crushes when the above program runs. Changing & to == fixes this issue as well.

It is clearly stated in section 9.4.1 of the OpenSHMEM 1.4 specification that multiple options can be combined with a bitwise OR operation.

@zhongchen530 zhongchen530 linked a pull request Jan 28, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant