-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI_Win_create/MPI_Win_free leaks about 728 bytes per call pair when acoll component is active #13070
Comments
Here's the test case:
Note there are all kinds of other memory leaks boiling up out of pmix, other parts of ompi, etc. but this one grows linearly with the number of MPI_Win_create/free calls whereas the others do not. |
this plugs one leak diff --git a/ompi/mca/coll/acoll/coll_acoll_module.c b/ompi/mca/coll/acoll/coll_acoll_module.c
index bbab603413..3924e755dc 100644
--- a/ompi/mca/coll/acoll/coll_acoll_module.c
+++ b/ompi/mca/coll/acoll/coll_acoll_module.c
@@ -60,11 +60,6 @@ mca_coll_base_module_t *mca_coll_acoll_comm_query(struct ompi_communicator_t *co
{
mca_coll_acoll_module_t *acoll_module;
- acoll_module = OBJ_NEW(mca_coll_acoll_module_t);
- if (NULL == acoll_module) {
- return NULL;
- }
-
if (OMPI_COMM_IS_INTER(comm)) {
*priority = 0;
return NULL;
@@ -74,6 +69,11 @@ mca_coll_base_module_t *mca_coll_acoll_comm_query(struct ompi_communicator_t *co
return NULL;
}
+ acoll_module = OBJ_NEW(mca_coll_acoll_module_t);
+ if (NULL == acoll_module) {
+ return NULL;
+ }
+
*priority = mca_coll_acoll_priority;
/* Set topology params */ |
@mshanthagit @amd-nithyavs Could you please have a look? |
You should take @ggouaillardet 's patch in any case although I don't think that's the problem - the comm handling in the osc/rdma component (where the MCW in the example is dup'd and split etc.) doesn't create intercomms I think - but maybe I'm wrong! |
@hppritcha are you enabling acoll during your testing? By default it is disabled, right? |
No I am not explicitly enabling it. However, even then, it builds as part of ompi and gets queried and creates the module struct which then gets partially destructed when a communicator that queried it gets destructed. the You are correct in the sense that it is not being used for collective operations, but the mere fact of querying the module is sufficient to cause the memory leak. If I explicitly disable the use of acoll with
then the memory leak does indeed vanish. |
@hppritcha I ran into this when running with two MPI tasks: a communicator under the hood had a single task iirc. Anyway, let me know how it goes, I will have more time this weekend if needed. |
@hppritcha @ggouaillardet thanks. Will look into this issue. |
@ggouaillardet thanks for catching the leak! @hppritcha I don't see the leak with your test case after @ggouaillardet's fix The patch snippet posted above "hides" another condition (pasted below), hence it seemed related to "intercomm"!
|
yikes that would do it. |
Have mca_coll_acoll_comm_query() not leak when invoked on an inter-communicator nor a single process communicator. Refs. open-mpi#13070 Signed-off-by: Gilles Gouaillardet <[email protected]>
A customer is looking into a memory leak issue using MPI RMA in both Open MPI and MPICH. I've been looking in to the problem on the Open MPI side and one thing that valgrind finds is that the acoll component is leaking about 728 bytes per MPI_Win_create/MPI_Win_free set of calls:
The acoll module destructor is getting invoked but it looks pretty complex and there's probably something not being freed.
I was using the share/openmpi/openmpi-valgrind.supp suppression file.
The text was updated successfully, but these errors were encountered: