Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function jl_gc_notify_thread_yield before a thread yields for GC #57297

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

qinsoon
Copy link
Contributor

@qinsoon qinsoon commented Feb 7, 2025

This PR adds a function jl_gc_notify_thread_yield to the GC interface. This will be used by MMTk to save the thread context (registers) for conservative scanning.

@qinsoon qinsoon added the GC: MMTK MMTK GC integration label Feb 7, 2025
@@ -108,6 +108,9 @@ JL_DLLEXPORT const char* jl_gc_active_impl(void);
// each GC should implement it but it will most likely not be used by other code in the runtime.
// It still needs to be annotated with JL_DLLEXPORT since it is called from Rust by MMTk.
JL_DLLEXPORT void jl_gc_sweep_stack_pools_and_mtarraylist_buffers(jl_ptls_t ptls) JL_NOTSAFEPOINT;
// Notifies the GC that the given thread is about to yield for a GC. ctx is the ucontext for the thread
// if it is already fetched by the caller, otherwise it is NULL.
JL_DLLEXPORT void jl_gc_notify_thread_yield(jl_ptls_t ptls, void* ctx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the term yield here, because it conflicts with the usage of the same term in the task subsystem and has a different meaning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestion that won't cause confusion in the context?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jl_gc_thread_will_join_collection or sth?

@Keno
Copy link
Member

Keno commented Feb 7, 2025

This will be used by MMTk to save the thread context (registers) for conservative scanning.

I don't know if I have mentioned this elsewhere, but I would recommend against this as a long-term strategy. We make no semantic guarantees that conservative scanning will be able to find all pointers.

@qinsoon
Copy link
Contributor Author

qinsoon commented Feb 7, 2025

This will be used by MMTk to save the thread context (registers) for conservative scanning.

I don't know if I have mentioned this elsewhere, but I would recommend against this as a long-term strategy. We make no semantic guarantees that conservative scanning will be able to find all pointers.

What is the concern here?

We had two concerns before we chose to use conservative scanning:

  1. LLVM may partition pointers (you do not see the pointer on the stack or registers at all). But it seems that optimization is rare, and only used for a small number of platforms (and those are not common platforms). So we decided to ignore this concern.
  2. There may be internal/derived pointers on the stack. We handled the case. We can find the base pointer from an internal pointer.

@Keno
Copy link
Member

Keno commented Feb 7, 2025

  • LLVM may partition pointers (you do not see the pointer on the stack or registers at all). But it seems that optimization is rare, and only used for a small number of platforms (and those are not common platforms). So we decided to ignore this concern.

Yes, this one. I agree that it is rare currently, I'm just warning that LLVM does not guarantee this, and you only need one to start crashing. I old the GAP.jl people the same thing when they started stack scanning julia.

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not sufficient in the presence of threads. I started a corrected implementation that takes into account threads at #56477, but the cost seemed too absurdly high, especially given that we already know LLVM will miscompile code in the presence of a conservative scanner, which will necessarily lead to unsound GC results. (without an explicit leak of the object to the shadow stack, it might not bother to assign NULLs to the slots or lengths to the arrays or types to the objects)

@qinsoon
Copy link
Contributor Author

qinsoon commented Feb 7, 2025

  • LLVM may partition pointers (you do not see the pointer on the stack or registers at all). But it seems that optimization is rare, and only used for a small number of platforms (and those are not common platforms). So we decided to ignore this concern.

Yes, this one. I agree that it is rare currently, I'm just warning that LLVM does not guarantee this, and you only need one to start crashing. I old the GAP.jl people the same thing when they started stack scanning julia.

I agree.

When we started prototyping for a moving GC, x86_64 is the only target we cared about. We found these two discussions and knew from them that this optimization was not enabled on x86_64 and it wouldn't be a show stopper for us to use conservative stack scanning:

Now we should assess how many targets Julia supports may be affected by the optimization, and whether the LLVM maintainers would encourage this optimization for more platforms. I don't have an answer to this. Insights are welcome.

At this point, conservative stack scanning is a key part of our prototype to support moving GCs in Julia. Conservative stack scanning helps us identify references on the stack, and we only need to worry about references in the native heap, code, or globals. If we cannot use conservative stack scanning, we would have to identify references on the stack through other approaches.

@vtjnash
Copy link
Member

vtjnash commented Feb 7, 2025

The GAP people generally get away with this since they implement their object creation using C functions, so Julia cannot optimize them. That isn't true for any Julia objects however.

@Keno
Copy link
Member

Keno commented Feb 7, 2025

This is not sufficient in the presence of threads

Are you thinking about threads running in gc-unsafe mode in particular? Or is there a problem with the gf-safe threads also that I'm not seeing?

@vtjnash
Copy link
Member

vtjnash commented Feb 7, 2025

Yes, specifically with gc-safe. Those are regions of code that cannot be stopped (would deadlock if you tried) but must be scanned.

@qinsoon
Copy link
Contributor Author

qinsoon commented Feb 7, 2025

Yes, specifically with gc-safe. Those are regions of code that cannot be stopped (would deadlock if you tried) but must be scanned.

Can you explain a bit more on this? I am not familiar with gc-safe/unsafe threads in Julia. But it sounds like you are talking about some threads that are allowed to run when a GC is happening. Generally in those cases, those threads are allowed to run, as GC does not care about those threads. Why do those threads need to be scanned?

@qinsoon qinsoon mentioned this pull request Feb 7, 2025
5 tasks
@Keno
Copy link
Member

Keno commented Feb 7, 2025

We permit threads to transition into regions where they run concurrently with the GC as long as they do not modify gc-tracked memory while they are in those regions. However, their stacks may still have references to gc-tracked objects that need to be scanned. That said, could you explain what you're trying to use conservative stack scanning for? You're not gonna be able to move the objects you find anyway, so I don't see how it helps you.

@oscardssmith
Copy link
Member

The plan is to pin everything directly referenced by the stack (as well as anything that gets GC.@preserved or pointered, but to allow moving other objects.

@Keno
Copy link
Member

Keno commented Feb 7, 2025

Ok, but in that case why not extend the ordinary GC stack and just pin everything there?

@qinsoon
Copy link
Contributor Author

qinsoon commented Feb 7, 2025

Ok, but in that case why not extend the ordinary GC stack and just pin everything there?

The GC stack does not capture all the references. If an object is rooted (and pushed to the GC stack), any object reachable from the object does not need to be rooted and may not appear on the GC stack.

@Keno
Copy link
Member

Keno commented Feb 7, 2025

Ok, sure, but we could just extend the macros to also specify the objects to be pinned - we already have the static analysis to check liveness across a safepoint for you, so that seems like a relatively easy thing to do in comparison to doing a correct conservative stack scan.

@qinsoon
Copy link
Contributor Author

qinsoon commented Feb 7, 2025

We permit threads to transition into regions where they run concurrently with the GC as long as they do not modify gc-tracked memory while they are in those regions. However, their stacks may still have references to gc-tracked objects that need to be scanned.

This actually might be a problem.

We currently do conservative scanning for two places: 1. native stacks, and 2. registers. We go through native stacks by iterating all the _jl_tls_states_t structs which should include all the threads (running or stopped). We go through registers using the hooks added by this PR to capture registers in the threads that are stopped. So we are not scanning registers for running threads. It is possible that a running thread may still have references in the regsiters, even though they are not allowed to use heap references.

We would need a way to let those running threads to spill all the registers to the stack before they transition into the GC-safe regions. We may use the same way to spill registers for the stopped threads as well (and may not need this hook in that case).

@Keno
Copy link
Member

Keno commented Feb 7, 2025

We would need a way to let those running threads to spill all the registers to the stack before they transition into the GC-safe regions. We may use the same way to spill registers for the stopped threads as well (and may not need this hook in that case).

This is insufficient - the threads are still allowed to read GC memory, just not write to it.

@qinsoon
Copy link
Contributor Author

qinsoon commented Feb 7, 2025

Ok, sure, but we could just extend the macros to also specify the objects to be pinned - we already have the static analysis to check liveness across a safepoint for you, so that seems like a relatively easy thing to do in comparison to doing a correct conservative stack scan.

It is possible. But it would be a lot of work to mark all the objects to be pinned through the code base. I had attempted something here for only two source files. All those PIN macros are the objects that are not in the GC stack, but used in the code.

We could extend the JL_GC_PUSH macro to add a list of objects to be pinned, like JL_GC_PUSH(x, [x.a, x.b, y]), and do not bother about unpin. This would require us to extend the static analyzer (if possible) to get it work.

@Keno
Copy link
Member

Keno commented Feb 7, 2025

and do not bother about unpin.

Presumably they could be unpinned together with the GC_POP.

This would require us to extend the static analyzer (if possible) to get it work.

Seems easier than a correct stack scan in the presence of the complications discussed above.

@qinsoon
Copy link
Contributor Author

qinsoon commented Feb 7, 2025

This would require us to extend the static analyzer (if possible) to get it work.

Seems easier than a correct stack scan in the presence of the complications discussed above.

With my limited knowledge on the GCChecker, I doubt it is easier to get a correct static analyzer than getting correct conservative scanning. I had attempts with GCChecker, but I found there were many assumptions wired in for the Julia's rooting semantics. And maintaining a pinning state along with the rooting state was not as easy as I thought. Besides, conservative scanning is a one-time effort. Once it is working, it works no matter how the runtime code changes. And explicitly pinning objects is a continuing effort that every runtime developer needs to be aware of.

With all that said, I am still open to the idea of explict pinning with macros. Just in our attempts, it did not work out, and conservative scanning worked.

@Keno
Copy link
Member

Keno commented Feb 7, 2025

I'm fine to proceed with conservative stack scanning if you can work out the correctness issues identified. That said, I don't think that it's an acceptable long-term solution, so we'll have to do something else before we can consider enabling mmtk by default. At this point, I think it's a question of your prioritization.

@vchuravy
Copy link
Member

vchuravy commented Feb 7, 2025

Keno mentioned that we allow threads to run concurrently to GC, and that they are allowed to read Julia memory.
#49933 will make that a more common occurrence and make it happen from user-code as well.

We would need a way to let those running threads to spill all the registers to the stack before they transition into the GC-safe regions. We may use the same way to spill registers for the stopped threads as well (and may not need this hook in that case).

That would be incredibly expensive and would stop us from using GC_safe regions for latency critical operations.

Besides, conservative scanning is a one-time effort. Once it is working, it works no matter how the runtime code changes. And explicitly pinning objects is a continuing effort that every runtime developer needs to be aware of.

That's a false dichotomy. We already have a precise GC and runtime developers and Julia language developers have been able to use it correctly (while assisted by things like the static analyzer). Conservative scanning has issues, and we don't know that it is possible to be correct given codegen.

@vtjnash
Copy link
Member

vtjnash commented Feb 7, 2025

That would be incredibly expensive and would stop us from using GC_safe regions for latency critical operations.

On further reflection, we do have this ability already (to support profiling of async thread operation; natively on most kernels, emulated on Linux), so while that is a high-cost operation, only a conservative GC scan must pay it, not the whole runtime. We could revive my PR therefore using that GetThreadContext implementation we already have there

@udesou
Copy link
Contributor

udesou commented Feb 7, 2025

The other thing to add is that the collectors we support so far do judicious copying, meaning that they don't try to copy everything at every GC, but use heuristics to decide what/when to copy. From a correctness perspective, we could turn off copying for GC runs in which not all threads have stopped but some are operating in a gc-safe region. It might not be ideal if you never hit a point in which all threads stop for GC, but we could have that to guarantee correctness until we think about a better solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC: MMTK MMTK GC integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants