-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a function jl_gc_notify_thread_yield
before a thread yields for GC
#57297
base: master
Are you sure you want to change the base?
Conversation
@@ -108,6 +108,9 @@ JL_DLLEXPORT const char* jl_gc_active_impl(void); | |||
// each GC should implement it but it will most likely not be used by other code in the runtime. | |||
// It still needs to be annotated with JL_DLLEXPORT since it is called from Rust by MMTk. | |||
JL_DLLEXPORT void jl_gc_sweep_stack_pools_and_mtarraylist_buffers(jl_ptls_t ptls) JL_NOTSAFEPOINT; | |||
// Notifies the GC that the given thread is about to yield for a GC. ctx is the ucontext for the thread | |||
// if it is already fetched by the caller, otherwise it is NULL. | |||
JL_DLLEXPORT void jl_gc_notify_thread_yield(jl_ptls_t ptls, void* ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the term yield
here, because it conflicts with the usage of the same term in the task subsystem and has a different meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any suggestion that won't cause confusion in the context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jl_gc_thread_will_join_collection
or sth?
I don't know if I have mentioned this elsewhere, but I would recommend against this as a long-term strategy. We make no semantic guarantees that conservative scanning will be able to find all pointers. |
What is the concern here? We had two concerns before we chose to use conservative scanning:
|
Yes, this one. I agree that it is rare currently, I'm just warning that LLVM does not guarantee this, and you only need one to start crashing. I old the GAP.jl people the same thing when they started stack scanning julia. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not sufficient in the presence of threads. I started a corrected implementation that takes into account threads at #56477, but the cost seemed too absurdly high, especially given that we already know LLVM will miscompile code in the presence of a conservative scanner, which will necessarily lead to unsound GC results. (without an explicit leak of the object to the shadow stack, it might not bother to assign NULLs to the slots or lengths to the arrays or types to the objects)
I agree. When we started prototyping for a moving GC, Now we should assess how many targets Julia supports may be affected by the optimization, and whether the LLVM maintainers would encourage this optimization for more platforms. I don't have an answer to this. Insights are welcome. At this point, conservative stack scanning is a key part of our prototype to support moving GCs in Julia. Conservative stack scanning helps us identify references on the stack, and we only need to worry about references in the native heap, code, or globals. If we cannot use conservative stack scanning, we would have to identify references on the stack through other approaches. |
The GAP people generally get away with this since they implement their object creation using C functions, so Julia cannot optimize them. That isn't true for any Julia objects however. |
Are you thinking about threads running in gc-unsafe mode in particular? Or is there a problem with the gf-safe threads also that I'm not seeing? |
Yes, specifically with gc-safe. Those are regions of code that cannot be stopped (would deadlock if you tried) but must be scanned. |
Can you explain a bit more on this? I am not familiar with gc-safe/unsafe threads in Julia. But it sounds like you are talking about some threads that are allowed to run when a GC is happening. Generally in those cases, those threads are allowed to run, as GC does not care about those threads. Why do those threads need to be scanned? |
We permit threads to transition into regions where they run concurrently with the GC as long as they do not modify gc-tracked memory while they are in those regions. However, their stacks may still have references to gc-tracked objects that need to be scanned. That said, could you explain what you're trying to use conservative stack scanning for? You're not gonna be able to move the objects you find anyway, so I don't see how it helps you. |
The plan is to pin everything directly referenced by the stack (as well as anything that gets |
Ok, but in that case why not extend the ordinary GC stack and just pin everything there? |
The GC stack does not capture all the references. If an object is rooted (and pushed to the GC stack), any object reachable from the object does not need to be rooted and may not appear on the GC stack. |
Ok, sure, but we could just extend the macros to also specify the objects to be pinned - we already have the static analysis to check liveness across a safepoint for you, so that seems like a relatively easy thing to do in comparison to doing a correct conservative stack scan. |
This actually might be a problem. We currently do conservative scanning for two places: 1. native stacks, and 2. registers. We go through native stacks by iterating all the We would need a way to let those running threads to spill all the registers to the stack before they transition into the GC-safe regions. We may use the same way to spill registers for the stopped threads as well (and may not need this hook in that case). |
This is insufficient - the threads are still allowed to read GC memory, just not write to it. |
It is possible. But it would be a lot of work to mark all the objects to be pinned through the code base. I had attempted something here for only two source files. All those PIN macros are the objects that are not in the GC stack, but used in the code. We could extend the |
Presumably they could be unpinned together with the GC_POP.
Seems easier than a correct stack scan in the presence of the complications discussed above. |
With my limited knowledge on the GCChecker, I doubt it is easier to get a correct static analyzer than getting correct conservative scanning. I had attempts with GCChecker, but I found there were many assumptions wired in for the Julia's rooting semantics. And maintaining a pinning state along with the rooting state was not as easy as I thought. Besides, conservative scanning is a one-time effort. Once it is working, it works no matter how the runtime code changes. And explicitly pinning objects is a continuing effort that every runtime developer needs to be aware of. With all that said, I am still open to the idea of explict pinning with macros. Just in our attempts, it did not work out, and conservative scanning worked. |
I'm fine to proceed with conservative stack scanning if you can work out the correctness issues identified. That said, I don't think that it's an acceptable long-term solution, so we'll have to do something else before we can consider enabling mmtk by default. At this point, I think it's a question of your prioritization. |
Keno mentioned that we allow threads to run concurrently to GC, and that they are allowed to read Julia memory.
That would be incredibly expensive and would stop us from using GC_safe regions for latency critical operations.
That's a false dichotomy. We already have a precise GC and runtime developers and Julia language developers have been able to use it correctly (while assisted by things like the static analyzer). Conservative scanning has issues, and we don't know that it is possible to be correct given codegen. |
On further reflection, we do have this ability already (to support profiling of async thread operation; natively on most kernels, emulated on Linux), so while that is a high-cost operation, only a conservative GC scan must pay it, not the whole runtime. We could revive my PR therefore using that GetThreadContext implementation we already have there |
The other thing to add is that the collectors we support so far do judicious copying, meaning that they don't try to copy everything at every GC, but use heuristics to decide what/when to copy. From a correctness perspective, we could turn off copying for GC runs in which not all threads have stopped but some are operating in a gc-safe region. It might not be ideal if you never hit a point in which all threads stop for GC, but we could have that to guarantee correctness until we think about a better solution. |
This PR adds a function
jl_gc_notify_thread_yield
to the GC interface. This will be used by MMTk to save the thread context (registers) for conservative scanning.