Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc: improve mallocarrays locality #56801

Merged
merged 1 commit into from
Dec 13, 2024
Merged

gc: improve mallocarrays locality #56801

merged 1 commit into from
Dec 13, 2024

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Dec 11, 2024

small_arraylist_t has much better memory locality and space utilization than a linked list with individually malloc'd elements

@vtjnash vtjnash added the GC Garbage collector label Dec 11, 2024
@vtjnash vtjnash requested a review from gbaraldi December 11, 2024 14:19
src/gc-stock.c Outdated Show resolved Hide resolved
@vtjnash vtjnash force-pushed the jn/mallocarrays-list branch from 8afa0e6 to 725adba Compare December 11, 2024 20:17
@gbaraldi
Copy link
Member

This seems to fix #56759 for me. So we probably want to backport this

@gbaraldi gbaraldi added the backport 1.11 Change should be backported to release-1.11 label Dec 11, 2024
@d-netto
Copy link
Member

d-netto commented Dec 11, 2024

This seems to fix #56759 for me. So we probably want to backport this

IIRC we're still using this data structure in 1.10.

If this is indeed a source of a memory leak, then we should consider backporting it to 1.10 as well.

@vtjnash vtjnash force-pushed the jn/mallocarrays-list branch 3 times, most recently from ed028da to b02629a Compare December 12, 2024 21:30
small_arraylist_t has much better memory locality and space utilization
than a linked list with individually malloc'd elements. However, seemed
that it needed alignment of small_arraylist_t to both hold all elements
the user might create and avoid gcc bugs.
@vtjnash vtjnash force-pushed the jn/mallocarrays-list branch from b02629a to 6e2c0cb Compare December 13, 2024 00:52
@MilesCranmer
Copy link
Member

Can confirm this fixes #56759. Nice work!

@vtjnash vtjnash merged commit 03a0247 into master Dec 13, 2024
5 of 7 checks passed
@vtjnash vtjnash deleted the jn/mallocarrays-list branch December 13, 2024 02:54
@vchuravy
Copy link
Member

Do we know why it fixes the memory leak? Were we simply not freeing the elements of the linked list?

@oscardssmith
Copy link
Member

yes

@vtjnash
Copy link
Member Author

vtjnash commented Dec 14, 2024

No idea

@MilesCranmer
Copy link
Member

Could this also be backported to 1.10?

@oscardssmith oscardssmith added the backport 1.10 Change should be backported to the 1.10 release label Dec 14, 2024
@alior101
Copy link

Thanks for this commit! I'v been banging my head for the last few days for an ever increasing MaxRSS in something that should be fixed.
Can you please share more details on why it solves the memory leak - I looked at the commit changes but it's not very clear..
Also, when is that backported to 1.11.3 ?
Thanks!

@MilesCranmer
Copy link
Member

MilesCranmer commented Dec 19, 2024

Ok I read through things a bit more carefully. My understanding is that the old code (and current 1.10 and 1.11) managed allocated blocks using this type mallocmemory_t which has a manually maintained linked-list. Nodes in this list are not properly being freed – only reused – resulting in a leak. (The buffers tracked by the nodes are freed, but not the nodes themselves.) And the reason it was only noticed on 1.11 is because this type started being used for more things (such as all arrays). My library SymbolicRegression.jl is doing tons of small array allocations which I suppose helped surface this issue.

I think the switch to use small_arraylist_t (and deletion of mallocmemory_t) by @vtjnash is because the small_arraylist_t is more widely used and thus has a more robust memory management in place due to greater usage. Which removes the issue entirely.

I think the actual underlying issue is here, in sweep_malloced_memory:

*pma = nxt;
int isaligned = (uintptr_t)ma->a & 1;
jl_gc_free_memory(a, isaligned);
ma->next = ptls2->gc_tls_common.heap.mafreelist;
ptls2->gc_tls_common.heap.mafreelist = ma;

So, the jl_gc_free_memory frees the buffer within the mallocmemory_t (the ->a, but not the mallocmemory_t node itself. So I think the issue is because the actual mallocmemory_t nodes just accumulate without being tracked?

Switching to small_arraylist_t for tracking allocations, and deleting mallocmemory_t, (rarely used before 1.11; hence why nobody noticed the bug I guess?) is another way to get around this issue. I guess it might be better overall (even if this is the underlying issue) because it means the GC is simpler and uses fewer data structures so is less likely to have this class of bug in the future.

I think an alternate fix would be to just do:

  *pma = nxt;
  int isaligned = (uintptr_t)ma->a & 1;
  jl_gc_free_memory(a, isaligned);
- ma->next = ptls2->gc_tls_common.heap.mafreelist;
- ptls2->gc_tls_common.heap.mafreelist = ma;
+ free(ma);

without further changes or needing to replace this mallocmemory_t type. But as I said, I think it's better to have fewer data structures and aim towards simplicity like this PR does so it's probably better this way!

(Hoping someone corrects me if I am wrong)

@MilesCranmer
Copy link
Member

MilesCranmer commented Dec 19, 2024

Cool. Just confirmed that just adding that single change to sweep_malloced_memory is enough to fix the leak. That might be an easier and more minimal backport for 1.10.8/1.11.3? @KristofferC

With @gbaraldi's example, on 1.11.2:

julia> using StatsBase

julia> function Simulate()
           Simulations=Int(1e7)
           Size=1000
           result = Array{Float64}(undef, Simulations, 1)
           Threads.@threads for i = 1:Simulations
                x = randn(Size)
                s = sort(x)
               result[i, 1] = s[1]
           end
           println(median(result))
       end
Simulate (generic function with 1 method)

julia> for i in 1:1000
           println(i)
           Simulate()
           GC.gc(true)
           # Print live_bytes
           println("live_bytes in MB: ", Base.gc_live_bytes() / 1024^2)
           sleep(10) # sleep for 10 seconds
       end
1
-3.1975019113594954
live_bytes in MB: 2182.0746631622314
2
-3.197639840642432
live_bytes in MB: 4273.654802322388
3
-3.1976844965361293
live_bytes in MB: 6365.400560379028
4
-3.1975014405024504
live_bytes in MB: 8457.508516311646
5
-3.1978872129155835
live_bytes in MB: 10549.042078018188

But if I just add that single free(ma), I get:

1
-3.1975903921733435
live_bytes in MB: 92.49524688720703
2
-3.1973693558307046
live_bytes in MB: 92.49922943115234
3
-3.1975461857988234
live_bytes in MB: 92.49922943115234
4
-3.1974292157057222
live_bytes in MB: 92.49922943115234
5
-3.197542860188766
live_bytes in MB: 92.49922943115234

@alior101
Copy link

Thanks for the explanation!

@KristofferC
Copy link
Member

On 1.10 I get:

1
-3.1976480507706193
live_bytes in MB: 129.57639598846436
2
-3.197508883859199
live_bytes in MB: 129.50681591033936
3

so I think it is fine there?

@KristofferC KristofferC removed the backport 1.10 Change should be backported to the 1.10 release label Jan 2, 2025
@MilesCranmer
Copy link
Member

mallocmemory_t isn't used for arrays on 1.10 which is what that test script does allocations for, but I think the leak should still be there

@KristofferC
Copy link
Member

Trying to run the example on #56741 with 2adfacc on it I get

1
-3.197463115148186
live_bytes in MB: 438.03278064727783
2
-3.1975149403064047
live_bytes in MB: 850.664532661438

so doesn't seem fixed with that commit?

stevengj pushed a commit that referenced this pull request Jan 2, 2025
@MilesCranmer
Copy link
Member

Weird, it worked for me, unless I ran the wrong binary. Maybe run it a bit longer to see if it saturates? I think I had set --heap-size-hint pretty low to force it to garbage collect earlier.

@MilesCranmer
Copy link
Member

Very weird. When I compile that specific commit I also see the behavior indicating the leak wasn't fixed. Maybe I just ran the wrong binary or something equally dumb.

I'm going to see if I just apply that patch to v1.11.2 it fixes it, because I think that's what I originally tried. Anyways it looks like people already backported to
8001392 so maybe just go for that.

On 1.10 it doesn't look like this data structure was actually used @d-netto (?) so perhaps we can leave it off the backport list.

@MilesCranmer
Copy link
Member

Yep the patch doesn't fix v1.11.2 either 😵‍💫. Cosmic ray must have hit the PC. Sorry about that. Mystery is back on. Going to go grab some booze...

@vtjnash
Copy link
Member Author

vtjnash commented Jan 2, 2025

jl_gc_track_malloced_array was present in v1.10, and just renamed in v1.11

@MilesCranmer
Copy link
Member

Thanks. I guess it would be helpful to understand the root cause for fixing a potential leak on 1.10, unless it’s ok to apply a similar replacement patch to LTS as well? (I wasn’t sure if the structure change could introduce other issues we might not have thought about yet, which might not be a small enough for an LTS version?)

My understanding is that the lines above are a recycling mechanism in the linked list of mallocmemory_t. And these are getting either accumulated faster than they can be recycled, or maybe failing to free their referenced allocation somewhere?

Maybe the first step is creating an equivalent to Simulate that actually hits this type of allocation in 1.10? Maybe there isn’t even a leak there

@alior101
Copy link

alior101 commented Jan 3, 2025

I'm on nightly and can see a fixed GC Live bytes but increasing RSS and MaxRSS untill OOM few hours later (app is allocating ~50 MB every few seconds inside a function) ... Running with size-hint of 600M for a container with mem limit of 2GB but that doesnt seem to help...

@d-netto
Copy link
Member

d-netto commented Jan 3, 2025

I'm on nightly and can see a fixed GC Live bytes but increasing RSS and MaxRSS untill OOM few hours later (app is allocating ~50 MB every few seconds inside a function) ... Running with size-hint of 600M for a container with mem limit of 2GB but that doesnt seem to help...

Do you have an open-source reproducer? If so, could you please open an issue?

@alior101
Copy link

alior101 commented Jan 3, 2025

unfortunatly not. But I'll try to come up with a minimal worklng example and post it here. BTW, reverting to 1.10.7 fixed the issue completly.

@alior101
Copy link

alior101 commented Jan 5, 2025

Ok, the OOM in the container was due to a lazy linux kernel clearing physical memory from cached log files. When I stopped writing logs the RSS stopped growing. It seems this growing RSS is not directly related to a leak in GC but to some caching and OOM killer race

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.11 Change should be backported to release-1.11 GC Garbage collector
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants