Replies: 6 comments
-
I thought the strategy for GIL removal was multiple interpreters? |
Beta Was this translation helpful? Give feedback.
-
What I am talking about is multi-threading similar to what's done in modern Java. Having a thread-safe GC is a pre-condition for that, I think. Making Java style threading work reliability would be huge project since you would need to solve a bunch of memory model issues. Supporting the C API makes it extra difficult. The Linux kernel did something similar when it removed the "big kernel lock" so I think it could be done. Should we do it? I'm not sure because it is so much work. Also, maybe performance would be unacceptably slower than single-threaded CPython. Perhaps sub-interpreters can give us multi-core performance that's good enough. I worry that the sub-interpreter approach will struggle to find a way to cheaply pass data between threads. If it's isolated threads with essentially IPC copying data between them, isn't that what the multi-processing module does already? If we want cheap data passing, I think we will need some mechanism for thread-safe memory ownership (i.e. likely some kind of thread safe GC). It would seem you still run into memory model questions as well. Maybe those are easier to solve at the interpreter-to-interpreter level but I worry that the problem has just been moved to a different layer and not solved. This hybrid GC idea is only worth pursing if we decide we want to pursue Java-style threading. I understood that's what people meant when taking about removing the GIL. |
Beta Was this translation helpful? Give feedback.
-
Everything I know says that we should not do Java-style threading. Yes, multiple interpreters require a way to pass data between them, but that can be a dedicated API that won't make regular objects slower. And yes, that's what people are thinking of when they talk about removing the GIL -- but it's basically so fraught with issues that I don't want to touch it. |
Beta Was this translation helpful? Give feedback.
-
Speaking of Java threads, Java itself is looking to improve/iterate on that with project loom. Here's an overview |
Beta Was this translation helpful? Give feedback.
-
Hm, that sounds more like a diatribe against async IO. :-) |
Beta Was this translation helpful? Give feedback.
-
Multiple interpreters will allow sharing of mutable arrays of primitive data (ints, floats, etc), and the backing data for immutable data like strings, array.array, etc. There will be some copying, but some things become a lot more efficient than using multi-processing. Communication can be entirely in user-space which will allow more third-party approaches. Any data passed from one processor to another has to pass through L3 cache, or main memory, so the cost of copying is not as bad as it might first appear. If the copied data fits into L2 cache, then the extra cost of copying is relatively small. Regarding ownership, because there cannot be any cycles through shared memory, simple (atomic) refcounting works perfectly. Multiple interpreters also have (some) resilience to hard crashes. Shared memory threads do not.
I suspect it is, but it doesn't mean that is what they really want 🙂 I suspect that "Remove the GIL" really means "I want to use all my cores without doing any extra work". This is much like the refrain "Python should have a JIT", when all they mean is that "Python should be faster", which we are happy to oblige 😄 |
Beta Was this translation helpful? Give feedback.
-
I'm hesitant to post this idea since it probably belongs in a "slower-cpython" rather than "faster-cpython" project. However, something like it is a necessary per-condition to removing the GIL and taking better advantage of multiple CPU cores. Making incref/defcref thread safe (e.g. atomic instructions) seems like a non-starter because the overhead is too high.
The basic idea is as follows: rather than having
refcnt > 0
as being the condition for an object being alive, make the condition that eitherrefcnt > 0
or the object is reachable from a GC root. Make use oftp_traverse
to implement the reachable check. Provide a way to explicitly define the GC roots. Once we have that, we can start removing incref/decref pairs if we can ensure that the object is "properly rooted". We could look for incref/decref hot spots and work on those first. E.g. inside ceval, if we know the object is rooted, we can avoid the incref/decref. dictobject would also be a hot spot.A simple implementation of this idea would be very slow. The problem is that when decref is called, we can no longer immediately free the object if
refcnt == 0
. The object might be reachable from a GC root. Since Python allocates objects at a fantastic rate, this would kill performance. What we need is a way to run the mark and sweep pass on only a subset of the heap. One way to do that is with a "write barrier". My idea for that is to basically copy the Caml Light GC design: http://pauillac.inria.fr/~doligez/caml-guts/Sestoft94.txtFor the mark and sweep, there would be a major and minor collection. A major collection would look at all objects. A minor collector would look at "young" objects. Young objects would be defined as objects created since the last GC pass. It can be implemented with a bitmap in the memory manager arenas. In addition to traversing the young objects, we also have to traverse any references from the old generation to those young objects. Those references can be tracked by explicitly calling a function like Caml's
Modify()
. That function adds the object pointer to a table that gets treated as roots for the next minor collection.A big advantage of this approach is that it could be implemented incrementally. In the first step, all objects would still have explicit reference counts. The only difference is that objects would no longer be deallocated as soon as
refcnt == 0
. In the following steps, we would carefully remove some incref/decref instructions. Where we do that, we have to be sure the object is reachable from GC roots and if the object or memory holding the reference is changed, we call theModify()
function.Beta Was this translation helpful? Give feedback.
All reactions