Multithreaded GC?

dlakelan · June 16, 2020, 6:17pm

I’ve seen several threads on garbage collection and its impact on low-latency / soft realtime usage. I was thinking about some raspberry pi type projects (I need to come up with things to do with my kids during COVID) and was thinking to myself that it should be possible to have GC run in one thread while other threads continue so long as those other threads aren’t allocating, if GC grabs a lock on the “allocator” then it’d make it so that any other threads simply stop if they allocate, but could continue if they’re using the stack only.

Is this already the way it works? It would seem like this could be an excellent way to deal with realtime control: tight carefully constructed loops handle the control of motors and lights and sensors, while more loosely constructed threads could handle things like network traffic and robot-planning and whatnot. The loosely constructed threads could easily accept tens or hundreds of milliseconds of GC pause, while the tight control threads don’t have to accept any real pauses.

yuyichao · June 16, 2020, 10:20pm

No it’s not as simple as that. There are many more things that has to be synchronized with the GC. Search concurrent GC.

dlakelan · June 16, 2020, 10:25pm

I understand concurrent GC in general is a complicated topic, but concurrent GC specifically in Julia’s implementation, what would be needed to be able to allow a thread that doesn’t allocate at all to continue while GC runs in another thread?

yuyichao · June 16, 2020, 10:33pm

It needs to have no assignmennts of local or global variables and no mutation of any objects. (Of course only mutation/variables that are boxed (allocated on the heap) counts.)

StefanKarpinski · June 16, 2020, 10:35pm

If a program doesn’t allocate at all, then no GC will occur. Naively one might think that you could let a thread run as long as it doesn’t allocate any memory while a collection runs in the background, but the tricky part is that the running thread can modify the object graph while the collection is occurring. In order to handle that correctly, you need to implement fully general concurrent GC. So the fact that the thread isn’t allocating doesn’t really simplify anything: allocation during collection isn’t what makes concurrent GC hard, it’s the fact that the object graph is changing while the collection occurs.

dlakelan · June 16, 2020, 10:42pm

At worst case though, don’t you just mark things that could have otherwise been unmarked? In other words, it makes for less efficient, but not incorrect GC.

I’m thinking of the following use case: A thread is running that reads some sensors, calculates some function, and then writes outputs to GPIO in a tight loop.

It’s not allocating anything, or if it does, it hits a lock and stops. So since there are no new objects, the worst case is that it mutates an object which disconnects some heap allocated object from the graph. If thread B is running the GC and it hasn’t yet reached the mutated object then it won’t mark the unreachable object and the unreachable object will be collected as it should. If thread B is running the GC and already marked the now unreachable object, then the now unreachable object will NOT be collected on this pass, and will be collected on the next pass.

Either way it seems the program would operate correctly.

Most of the cases where it’s much trickier come from compacting collectors or copying collectors etc. My understanding is Julia doesn’t move anything, so it doesn’t have to update pointer values so it wouldn’t hit the worst problems (ie. where thread A relies on a pointer whose location is out of date because thread B moved the object)

Oscar_Smith · June 16, 2020, 10:46pm

The problem is that you’ve mis-identified the worst case. The worst case is that you disconnect a heap object from one part of the graph and reconnect it to a different part. In this situation, it’s possible an object will be marked because what it is connected to changes while the GC is running.

yuyichao · June 16, 2020, 10:46pm

No in the worse case you can miss an entire branch of objects.

No it can move objects from a unscaned branch to an scanned branch and therefore hide it from the GC.

dlakelan · June 16, 2020, 10:50pm

Got it. That makes sense.

How about being able to mark a thread somehow as “I know what I’m doing don’t pause me for GC”. In particular when we’re talking about tight i/o control loops that only read from existing variables and bang out bits on hardware, it seems like the fact that they don’t mutate references to objects would leave you OK.

I’m thinking for example that as a work-around you might write your loop in C and spawn off your own thread, and then it wouldn’t interact with Julia in any way except reading the values of global variables or some such things. But then, why force the programmer to write in C?

yuyichao · June 16, 2020, 10:53pm

The mechanism is already there and has been there for 3 years now and is used in some cases. There’s no way that’ll be an API for the user to control, since there’s absolutely no way it can be written correctly, not because people are dump, but because julia code provide no necessary guarantee to allow it.

It has to be done automatically and the enabling work is quite tedious.

dlakelan · June 16, 2020, 10:58pm

I’m in the dark about this capability is there somewhere you can point me to read about it? You say it’s never going to be a user API so you’re saying only internal julia threads can use it? Or it’s just not a public API so it has no guarantees and can change from version to version?

yuyichao · June 16, 2020, 11:00pm

github.com/JuliaLang/julia

Use safepoint to deliver SIGINT

JuliaLang:master ← JuliaLang:yyc/threads/safepoint-signal

opened 02:36AM - 03 May 16 UTC

yuyichao

+859 -345

This implements @vtjnash 's [idea](https://github.com/JuliaLang/julia/issues/146…75#issuecomment-171810732) of delivering sigint in a safer way, without having to add sigatomic to everywhere. This also solves a few performance issues along the way. A brief summary of the differences, - Make GC safepoint thread local And create 3 signal pages. See the comment in `safepoint.c` for detail. This have a very minor performance hit for accessing safepoint. However, I think we can avoid the safepoint and GC transition around gcframe setup and gcframe pop on x86 so this shouldn't be a big issue. (On ARM, the atomic release store is more expensive than two GC safepoint so we may use a normal store with two safepoint instead, the performance hit of making the safepoint thread local is still very minor in this case though (~5-10% for empty GC frame push/pop, cheaper than the tls getter call....)). - Use the GC safepoint mechanism (i.e. SegFault) to deliver InterruptException at known points This automatically makes `ccall` of external library `sigatomic`. (Fix #1468, Fix #2622). In order to avoid waiting too long before the exception is delivered when running unmanaged code, this also implement waking up libuv, abort syscall and abort libsupport io (this use a hack but can be generalized if needed) when sigint arrives. This also implement force throwing of exception if SIGINT arrives too frequently so that one can use `Ctrl-C` to force abort some dead loop. A warning will be printed in such case since it bypass the safe path and even sigatomic. The most important consequence IMHO is that pressing `Ctrl-C` during sysimg compilation should no longer segfault =) This should also make it easier to implement #14675. - Make parser and type inference calls sigatomic sigatomic is much cheaper (see below) now so it is now used to protect important runtime code to not be interrupted by `Ctrl-C`. - Clean up safepoint and sigatomic, optimize try-catch and `sigatomic` - Split out `safepoint.c`, make `defer_signal` thread local and avoid the expensive atomic ops - Make `defer_signal` task local and automatically restore on expection. This eliminate the try-catch needed before in `disable_sigint`. This is technically breaking but `sigatomic_begin` and `sigatomic_end` aren't exported. - Inline `jl_sigatomic_begin` and `jl_sigatomic_end` in codegen. Benchmarking try-catch and `disable_sigint` ``` jl julia> @inline f1() = try end f1 (generic function with 1 method) julia> @inline f2() = disable_sigint(()->nothing) f2 (generic function with 1 method) julia> g1(n) = for i in 1:n f1() end g1 (generic function with 1 method) julia> g2(n) = for i in 1:n f2() end g2 (generic function with 1 method) julia> function k(n) @time g1(n) @time g2(n) end k (generic function with 1 method) ``` Timing before ``` jl julia> k(10_000_000) 0.264041 seconds 0.401437 seconds ``` Timing after ``` jl julia> k(10_000_000) 0.122592 seconds 0.012506 seconds ``` Close https://github.com/JuliaLang/julia/pull/12333 Close #12309

No

No.

I’m saying it won’t be under user’s control.

dlakelan · June 16, 2020, 11:10pm

Meaning as a user I can’t use it? Or am I being dense?

yuyichao · June 16, 2020, 11:21pm

No it means that you don’t control it. If it is implemented it’ll automatically make your loop satisfying the condition run concurrently with the GC but you can never mark a region to behave like that.

Now that does not preclude other features like marking a block of code to invoke no julia runtime (or throw an error). If such a feature is implemented then it will be possible to write code that can be run with GC concurrently correctly.

However, if both features are implemented there’ll also be no point to give the user manual control on if the julia-runtime-free code should run concurrently with the GC and it’ll just be guaranteed to always do.

dlakelan · June 16, 2020, 11:24pm

Aha! so you’re saying that Julia will automatically determine whether some code can be safely run during GC! That would be awesome!

It sounds like that’s not currently yet the case though? So as of today, if you wanted to do something like software PWM on the raspberry pi, where you need to every 0.5 ms write a new value to the GPIO without fail (or let’s say probabilistically with 99.99% probability or better), is there a way to make this happen in 1.4.2 etc?

yuyichao · June 16, 2020, 11:26pm

That’s right. I’m just saying it’ll either has to be nothing, or you’ll have it be automatic. There’s no simpler-to-implement manual version that you can have in the mean time…

dlakelan · June 16, 2020, 11:34pm

How about the work-around, where you call a C function that spawns a pthread on its own and it receives a pointer to some persistent julia stack-allocated objects that it reads to determine what to do on the I/O pins? Is that a possible work around? Or does that thread wind up getting paused as well?

Imagine the following use case… I have a julia thread that does some network programming and based on fairly complicated conditions, it controls a light-show via Raspberry Pi GPIO pins (let’s say 5 RGB LEDs so 15 pins need software PWM)

It’s ok if the main julia thread allocates and does general julia stuff and hits GC and is paused for 0.1 seconds or so, but it’s not ok for the bit-banging software PWM to cause the lights to suddenly turn off or turn on full brightness for 0.1 seconds… So I want a thread that reads some memory locations in a tight loop, and outputs GPIOs to multiple lights so as to reliably provide a smooth lightshow.

As of today, is that scenario possible?

yuyichao · June 16, 2020, 11:37pm

Julia has no control over yout C thread. You just need to make sure the memory the C thread accessed is valid, and it doesn’t even needs to be stack memory.

dlakelan · June 16, 2020, 11:39pm

Thank you so much for your time and explanations! That helps me a lot figuring out how to do this kind of project.

Topic		Replies	Views
Multithreaded program hangs without explict GC.gc() General Usage question , multithreading , garbage-collection	6	926	July 20, 2023
Issues with foreign threads calling back Julia and GC General Usage question , hep	11	470	September 29, 2023
Poor performance of garbage collection in multi-threaded application Julia at Scale garbage-collection	22	5439	February 3, 2022
GC problems with `jl_gc_unsafe_enter` with multithreaded embedding General Usage embedding , garbage-collection , java	2	361	January 25, 2024
Embedding Julia into multithreading apps Internals & Design	26	3204	January 28, 2019

Multithreaded GC?

Related topics