Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI_Win_create/MPI_Win_free leaks about 728 bytes per call pair when acoll component is active #13070

Open
hppritcha opened this issue Jan 29, 2025 · 10 comments
Assignees

Comments

@hppritcha
Copy link
Member

A customer is looking into a memory leak issue using MPI RMA in both Open MPI and MPICH. I've been looking in to the problem on the Open MPI side and one thing that valgrind finds is that the acoll component is leaking about 728 bytes per MPI_Win_create/MPI_Win_free set of calls:

=56433== 7,280 bytes in 10 blocks are definitely lost in loss record 334 of 347
==56433==    at 0x48388E4: malloc (vg_replace_malloc.c:446)
==56433==    by 0x4A53B1D: opal_obj_new (opal_object.h:495)
==56433==    by 0x4A53A22: opal_obj_new_debug (opal_object.h:256)
==56433==    by 0x4A53BCC: mca_coll_acoll_comm_query (coll_acoll_module.c:63)
==56433==    by 0x49D1219: query_3_0_0 (coll_base_comm_select.c:588)
==56433==    by 0x49D11DD: query (coll_base_comm_select.c:571)
==56433==    by 0x49D10D0: check_one_component (coll_base_comm_select.c:536)
==56433==    by 0x49D0C85: check_components (coll_base_comm_select.c:456)
==56433==    by 0x49CFB04: mca_coll_base_comm_select (coll_base_comm_select.c:233)
==56433==    by 0x48CD150: ompi_comm_activate_complete (comm_cid.c:923)
==56433==    by 0x48CE02C: ompi_comm_activate_nb_complete (comm_cid.c:1119)
==56433==    by 0x48D1903: ompi_comm_request_progress (comm_request.c:154)
==56433== 

The acoll module destructor is getting invoked but it looks pretty complex and there's probably something not being freed.

I was using the share/openmpi/openmpi-valgrind.supp suppression file.

@hppritcha
Copy link
Member Author

Here's the test case:

#include <iostream>
#include <mpi.h>
#include <stdio.h>
#include <vector>
 
int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);
 
    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);
 
    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
 
    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);
    // Junk data for the mpi window
    std::vector<double> data(world_size);
    size_t N_create_free = 100;
    //size_t N_create_free = 20000;
    for(size_t i=0; i<N_create_free; i++){
    if(i % 100 == 0 && world_rank == 0) {
      std::cout << "completed... " << i << std::endl;
    }
#if 1
    MPI_Win win = MPI_WIN_NULL;
    MPI_Win_create(data.data(), data.size() * sizeof(double), sizeof(double),
                   MPI_INFO_NULL, MPI_COMM_WORLD, &win);
    auto errorcode = MPI_Win_fence((MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED), win);
    if (!(errorcode == MPI_SUCCESS))
      std::cout << "ERROR: MPI ERROR WAS DETECTED" << std::endl;
    MPI_Win_free(&win);
#endif
    }   
    // Print off a hello world message
    printf("Hello world from processor %s, rank %d out of %d processors\n",
           processor_name, world_rank, world_size);
 
    // Finalize the MPI environment.
    MPI_Finalize();
}

Note there are all kinds of other memory leaks boiling up out of pmix, other parts of ompi, etc. but this one grows linearly with the number of MPI_Win_create/free calls whereas the others do not.

@ggouaillardet
Copy link
Contributor

this plugs one leak

diff --git a/ompi/mca/coll/acoll/coll_acoll_module.c b/ompi/mca/coll/acoll/coll_acoll_module.c
index bbab603413..3924e755dc 100644
--- a/ompi/mca/coll/acoll/coll_acoll_module.c
+++ b/ompi/mca/coll/acoll/coll_acoll_module.c
@@ -60,11 +60,6 @@ mca_coll_base_module_t *mca_coll_acoll_comm_query(struct ompi_communicator_t *co
 {
     mca_coll_acoll_module_t *acoll_module;
 
-    acoll_module = OBJ_NEW(mca_coll_acoll_module_t);
-    if (NULL == acoll_module) {
-        return NULL;
-    }
-
     if (OMPI_COMM_IS_INTER(comm)) {
         *priority = 0;
         return NULL;
@@ -74,6 +69,11 @@ mca_coll_base_module_t *mca_coll_acoll_comm_query(struct ompi_communicator_t *co
         return NULL;
     }
 
+    acoll_module = OBJ_NEW(mca_coll_acoll_module_t);
+    if (NULL == acoll_module) {
+        return NULL;
+    }
+
     *priority = mca_coll_acoll_priority;
 
     /* Set topology params */

@edgargabriel
Copy link
Member

@mshanthagit @amd-nithyavs Could you please have a look?

@hppritcha
Copy link
Member Author

You should take @ggouaillardet 's patch in any case although I don't think that's the problem - the comm handling in the osc/rdma component (where the MCW in the example is dup'd and split etc.) doesn't create intercomms I think - but maybe I'm wrong!

@mshanthagit
Copy link
Contributor

@hppritcha are you enabling acoll during your testing? By default it is disabled, right?

@hppritcha
Copy link
Member Author

No I am not explicitly enabling it. However, even then, it builds as part of ompi and gets queried and creates the module struct which then gets partially destructed when a communicator that queried it gets destructed. the partially causes the memory leak. the mca_coll_acoll_module_destruct looks quite complex with many things being freed up, but the behavior i'm seeing with valgrind indicates something isn't being freed. I tried building without xpmem support (so removing a block of the code from this function) but the memory leak is still present.

You are correct in the sense that it is not being used for collective operations, but the mere fact of querying the module is sufficient to cause the memory leak.

If I explicitly disable the use of acoll with

export OMPI_MCA_coll=^acoll

then the memory leak does indeed vanish.

@ggouaillardet
Copy link
Contributor

@hppritcha I ran into this when running with two MPI tasks: a communicator under the hood had a single task iirc. Anyway, let me know how it goes, I will have more time this weekend if needed.

@mshanthagit
Copy link
Contributor

@hppritcha @ggouaillardet thanks. Will look into this issue.

@mshanthagit
Copy link
Contributor

@ggouaillardet thanks for catching the leak! @hppritcha I don't see the leak with your test case after @ggouaillardet's fix

The patch snippet posted above "hides" another condition (pasted below), hence it seemed related to "intercomm"!


    if (OMPI_COMM_IS_INTRA(comm) && ompi_comm_size(comm) < 2) {
        *priority = 0;
        return NULL;
    }

@hppritcha
Copy link
Member Author

yikes that would do it.

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Jan 31, 2025
Have mca_coll_acoll_comm_query() not leak when invoked on
an inter-communicator nor a single process communicator.

Refs. open-mpi#13070

Signed-off-by: Gilles Gouaillardet <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants