Case study: Real time hardware control for adaptive optics with Julia

[not sure this is the best category, mods feel free to move]

Hi all,
I wanted to share our recent conference proceeding on using Julia in a real time application: [2407.07207] Real-time adaptive optics control with a high level programming language

Our use case requires processing 2-3 high speed camera streams (up to ~1 thousand frames per second) and responding with hardware commands.
The application is an adaptive optics system in the SPIDERS instrument that will sit behind the Subaru telescope in Hawaii and dynamically compensate for the turbulence of the Earth’s atmosphere, revealing fainter exoplanets.

Besides the usual performance recommendations (make sure everything is type stable, avoid allocations, etc), we wanted to share the architecture we settled on after a few iterations.

We built the software as a pipeline composed of independent single-threaded Julia processes. This was to mitigate the stop-the-world GC behaviour and allow us to decouple soft real time processes, where we want to be able to allocate freely, and hard real time processes which should never pause.

Thankfully with this design, the heap size of each process remains small. If we do trigger an allocation and GC pause the resulting GC latency is typically sub 0.5ms. This is nice to fall back on for e.g. error paths or cleanup paths where we we don’t want to spend the same amount of time optimizing the code as we do for the hot path.

Within each component, we structure the code using a hierarchical state machine. We write callbacks that respond to events in a given operation state and either perform some action, or transition to another state. This worked well because it encourages devs to write small functions, avoid closures (and the closure capturing performance bug), and specify the types of all variables in the state machine definition.

The communication between processes was accomplished with Aeron.jl. Aeron is a latency optimized IPC and UDP communication library coming from the high frequency trading sector.

Within each aeron message, we used an implementation of SimpleBinaryEncoding. This provides a stuct and/or array like interface to a flat contiguous UInt8 buffer without any memory copying for encode/decode. It does require you to specify each message type using an XML schema however.

We also ran everything on a linux kernel with the RT PREMPT patch and used thread pinning, although these steps shouldn’t really be necessary with Aeron.

Aeron has its own message archiving and replay functionality but we ended up building our own built over SQLite.jl.

To interact with all the distributed components of the system, we built a Julia library, a command line interface, and a graphical user interface using CImGui.jl (great package @Gnimuc !)

In the end, without yet having spent too much time on configuring our hardware and OS, we get pretty decent latency numbers:

Screen shot of the GUI, which runs at 60Hz!:

We hope this provides others with some confidence that Julia is also a great fit for their real time applications!

Wish list

A few wish-list items based on this experience:

  • Escape analysis and stack allocation of temporary arrays. This would make it a lot easier for less experienced devs to write non-allocating real time loops and not trigger the GC.
  • Relatedly, some of the work towards integrating other GC backends might be interested if there’s one that could guarantee shorter pauses.
  • The recent PR implementing a first-party TypedCallable is promising. Currently we use FunctionWrappers.jl to implement our state machines, but that library triggers a false alarm warning about a memory allocation from AllocCheck.jl.

Happy to answer any questions, and interested to hear from anyone else using Julia in this domain.

Thanks!

54 Likes

Cool, thanks for sharing!

There was a talk on JuliaCon about hard real-time applications in robotics, the section on the real-time GC is particularly interesting, they achieve a maximum GC latency below 10µs.

8 Likes

Congrats!

I wonder if a modified version of the first post could become a blog post for Julia lang’s website. Use cases like this sell Julia.

2 Likes

I really like your GUI, submit this to imgui gallery: Gallery: Post your screenshots / code here (PART 8) · Issue #2265 · ocornut/imgui · GitHub

1 Like

Very cool project! I’ve worked on Subaru data before so I appreciate the application as well. Are all of the single-thread Julia processes ran on the same node, or are they distributed across a multi-node system?

You mention using independent single-threaded Julia processes to mitigate GC interruptions. Were there other places where the constraints of Julia influenced the architectural design? For example, do you think you would you have gone with the multi-process, message-passing design if you were working in a traditional language (e.g., C/C++)? As someone not in the RT space it is unclear what design decisions are “normal” for RT applications and which were motivated by Julia-specific concerns.

1 Like

Thanks for your comments @cgarling!

I would say that there are plenty of RT applications that are monoliths (per node) but also plenty that are built in this kind of distributed message passing way, and either can be a good choice.

In our case, we started out in Julia with a multi-threaded design using channels to send notifications between threads, but then had to start over with a new architecture when it became clear this wasn’t going to scale in Julia.

2 Likes

Really cool!
Regarding your first point in the wish list (stack allocation of temporary arrays), did you try out Bumper.jl?

I started playing around with it exactly for that reason in a recent project and it seems relatively easy to use at least for my simple use case!

1 Like

Thanks @disberd! I have seen Bumper.jl but haven’t tried it. The main thing I would want it to help with is temporary allocations, eg when you write x = A * b or even x .= A * b. I couldn’t tell from the docs page, is that supported?

Assuming you’re referring to matrix multiplication here, check out the LinearAlgebra.mul! function to perform multiplication with a pre-allocated output array. For example, x .= A * b can be written allocation-free as mul!(x, A, b). You’ll still need to allocate the output array somehow, but you can use Bumper for that or simply allocate it before you get started (if the size won’t change).

2 Likes

Thanks @milkmoore,
Yes I’m quite aware of in place operations.

My point is more that if the compiler escape analysis improves it would be nice if these operations could be stack allocated (or at least automatically de allocated without triggering the GC) so that less sophisticated users could write performant real time code.

One benefit of Julia is the math like syntax for linear algebra, so it’s sad we have to give that up for steady performance.

3 Likes