I’ve seen several threads on garbage collection and its impact on low-latency / soft realtime usage. I was thinking about some raspberry pi type projects (I need to come up with things to do with my kids during COVID) and was thinking to myself that it should be possible to have GC run in one thread while other threads continue so long as those other threads aren’t allocating, if GC grabs a lock on the “allocator” then it’d make it so that any other threads simply stop if they allocate, but could continue if they’re using the stack only.
Is this already the way it works? It would seem like this could be an excellent way to deal with realtime control: tight carefully constructed loops handle the control of motors and lights and sensors, while more loosely constructed threads could handle things like network traffic and robot-planning and whatnot. The loosely constructed threads could easily accept tens or hundreds of milliseconds of GC pause, while the tight control threads don’t have to accept any real pauses.
I understand concurrent GC in general is a complicated topic, but concurrent GC specifically in Julia’s implementation, what would be needed to be able to allow a thread that doesn’t allocate at all to continue while GC runs in another thread?
It needs to have no assignmennts of local or global variables and no mutation of any objects. (Of course only mutation/variables that are boxed (allocated on the heap) counts.)
If a program doesn’t allocate at all, then no GC will occur. Naively one might think that you could let a thread run as long as it doesn’t allocate any memory while a collection runs in the background, but the tricky part is that the running thread can modify the object graph while the collection is occurring. In order to handle that correctly, you need to implement fully general concurrent GC. So the fact that the thread isn’t allocating doesn’t really simplify anything: allocation during collection isn’t what makes concurrent GC hard, it’s the fact that the object graph is changing while the collection occurs.
At worst case though, don’t you just mark things that could have otherwise been unmarked? In other words, it makes for less efficient, but not incorrect GC.
I’m thinking of the following use case: A thread is running that reads some sensors, calculates some function, and then writes outputs to GPIO in a tight loop.
It’s not allocating anything, or if it does, it hits a lock and stops. So since there are no new objects, the worst case is that it mutates an object which disconnects some heap allocated object from the graph. If thread B is running the GC and it hasn’t yet reached the mutated object then it won’t mark the unreachable object and the unreachable object will be collected as it should. If thread B is running the GC and already marked the now unreachable object, then the now unreachable object will NOT be collected on this pass, and will be collected on the next pass.
Either way it seems the program would operate correctly.
Most of the cases where it’s much trickier come from compacting collectors or copying collectors etc. My understanding is Julia doesn’t move anything, so it doesn’t have to update pointer values so it wouldn’t hit the worst problems (ie. where thread A relies on a pointer whose location is out of date because thread B moved the object)
The problem is that you’ve mis-identified the worst case. The worst case is that you disconnect a heap object from one part of the graph and reconnect it to a different part. In this situation, it’s possible an object will be marked because what it is connected to changes while the GC is running.
How about being able to mark a thread somehow as “I know what I’m doing don’t pause me for GC”. In particular when we’re talking about tight i/o control loops that only read from existing variables and bang out bits on hardware, it seems like the fact that they don’t mutate references to objects would leave you OK.
I’m thinking for example that as a work-around you might write your loop in C and spawn off your own thread, and then it wouldn’t interact with Julia in any way except reading the values of global variables or some such things. But then, why force the programmer to write in C?
The mechanism is already there and has been there for 3 years now and is used in some cases. There’s no way that’ll be an API for the user to control, since there’s absolutely no way it can be written correctly, not because people are dump, but because julia code provide no necessary guarantee to allow it.
It has to be done automatically and the enabling work is quite tedious.
I’m in the dark about this capability is there somewhere you can point me to read about it? You say it’s never going to be a user API so you’re saying only internal julia threads can use it? Or it’s just not a public API so it has no guarantees and can change from version to version?
No it means that you don’t control it. If it is implemented it’ll automatically make your loop satisfying the condition run concurrently with the GC but you can never mark a region to behave like that.
Now that does not preclude other features like marking a block of code to invoke no julia runtime (or throw an error). If such a feature is implemented then it will be possible to write code that can be run with GC concurrently correctly.
However, if both features are implemented there’ll also be no point to give the user manual control on if the julia-runtime-free code should run concurrently with the GC and it’ll just be guaranteed to always do.
Aha! so you’re saying that Julia will automatically determine whether some code can be safely run during GC! That would be awesome!
It sounds like that’s not currently yet the case though? So as of today, if you wanted to do something like software PWM on the raspberry pi, where you need to every 0.5 ms write a new value to the GPIO without fail (or let’s say probabilistically with 99.99% probability or better), is there a way to make this happen in 1.4.2 etc?
That’s right. I’m just saying it’ll either has to be nothing, or you’ll have it be automatic. There’s no simpler-to-implement manual version that you can have in the mean time…
How about the work-around, where you call a C function that spawns a pthread on its own and it receives a pointer to some persistent julia stack-allocated objects that it reads to determine what to do on the I/O pins? Is that a possible work around? Or does that thread wind up getting paused as well?
Imagine the following use case… I have a julia thread that does some network programming and based on fairly complicated conditions, it controls a light-show via Raspberry Pi GPIO pins (let’s say 5 RGB LEDs so 15 pins need software PWM)
It’s ok if the main julia thread allocates and does general julia stuff and hits GC and is paused for 0.1 seconds or so, but it’s not ok for the bit-banging software PWM to cause the lights to suddenly turn off or turn on full brightness for 0.1 seconds… So I want a thread that reads some memory locations in a tight loop, and outputs GPIOs to multiple lights so as to reliably provide a smooth lightshow.
Julia has no control over yout C thread. You just need to make sure the memory the C thread accessed is valid, and it doesn’t even needs to be stack memory.