How difficult is it to write allocation-free code to avoid GC pauses?

I’m doing almost all of my analysis in Julia, so it would be nice if I could avoid rewriting some of my algorithms in another language for production. The problem is that in production I have latency requirements that basically mean I can’t have pauses longer than ~1 millisecond, which is below the time it takes for Julia’s GC to run.

It’s fairly easy for me to write my quantitative algorithms in a way that avoids allocations (I’ve already done this to maximize throughput performance), but not sure how easy it is for the other bits: reading/writing to web socket and writing to a binary log file on disk. Is it possible to use standard Julia libraries or will I need to write custom Julia or C/C++ code?

Can I put the GC in a debug mode to print notifications that it’s been triggered?

EDIT: I should clarify: it’s ok to allocate or GC at startup or when opening new network connections, which stay open for hours. The main requirement is to consistently respond quickly to websocket messages after everything is open.

I feel like this is too open ended. It depends on what you want to use in the standard library and 3rd party libraries. So if you are writing something that doesn’t use any methods you didn’t defined, then it’s “easy” to write code that doesn’t trigger the garbage collector.

If you do want to use a method you didn’t create then you will have to look at the source for that function to see what kind of memory allocations are done. The BenchmarkTools package the @benchmark macro will probably be your friend since it will tell you if that method performed any allocations.

The other option is to use the --track-allocations flag when starting Julia. Once the program is done it will tell you what line allocated memory, and how much.

Yeah after a bit more thought, I’m pretty sure I’m going to need to write this in C++ or Rust. I’ll be parsing nested JSON arrays, which seems impossible to do without allocating unless I write my own JSON library.

It definitely sounds like Julia doesn’t provide the requirements you want need. Validating that your responses are never delayed by more than 1ms is just not something the garbage collector can guarantee. I haven’t looked at what guarantees other garbage collectors provide but I think any of them can give you that guarantee.

Go’s GC would be plenty fast. They’ve gotten the pauses down below 100-200 microseconds:

Some interesting numbers:

@benchmark begin a=zeros(1000000); a=0; end
BenchmarkTools.Trial: 
  memory estimate:  7.63 MiB
  allocs estimate:  2
  --------------
  minimum time:     800.316 μs (0.00% GC)
  median time:      1.018 ms (0.00% GC)
  mean time:        1.128 ms (7.15% GC)
  maximum time:     2.726 ms (34.31% GC)
  --------------
  samples:          4402
  evals/sample:     1

julia> @benchmark begin a=zeros(1000000); a=0;GC.gc() end
BenchmarkTools.Trial: 
  memory estimate:  7.63 MiB
  allocs estimate:  2
  --------------
  minimum time:     62.242 ms (98.74% GC)
  median time:      65.778 ms (98.66% GC)
  mean time:        66.492 ms (98.70% GC)
  maximum time:     77.300 ms (98.66% GC)
  --------------
  samples:          76
  evals/sample:     1

 @benchmark begin a=zeros(1000000); a=0;GC.gc(false) end
BenchmarkTools.Trial: 
  memory estimate:  7.63 MiB
  allocs estimate:  2
  --------------
  minimum time:     914.574 μs (17.75% GC)
  median time:      1.079 ms (18.79% GC)
  mean time:        1.111 ms (19.31% GC)
  maximum time:     2.636 ms (13.62% GC)
  --------------
  samples:          4472
  evals/sample:     1

So, at least on my desktop machine which isn’t super slow, just allocating 1 million floats takes around 1ms, and obviously doing a full gc on that takes a long time, but doing an incremental gc it adds around 0.2 ms

I don’t know how you’re planning to parse large nested JSON objects and calculate answers and spit them out on a socket faster than my machine can allocate 1M floats but if you can figure that out, it does seem like you might just call an incremental gc after serving each request and it could still be in your soft-real-time budget.

You might try ccall to call an existing JSON parser that does its own memory management, and handle only the quantitative bits in julia.

Your application sounds like high frequency trading or something?

The tradition for HFT (as I’ve heard it) is to not worry about memory leaks (or GC) and just restart everything every night. You just have to keep your leaks/GC usage low enough to stay within RAM for 8 hours… with an “easy” solution of buying more RAM as your usage grows.

I know the robotics folks have had lots of success getting Julia to hit hard real-time guarantees without such cheats, but parsing variable-length JSON arrays does make this a bit trickier. That said, I think you could (ab)use JSON3’s internals with pre-allocated “tape” vectors to get a long way there (for a rigorously defined and limited JSON structure).

1 Like

The JSON messages aren’t that large, typically 1-2 KB. The calculations I’ll be running are highly optimized and can be run in around 10 microseconds.

Thanks, I’ll take a look at JSON3’s internals.

I’d really need to get the allocations down to run with GC totally disabled. Just looking at the current bandwidth usage on this server, I’m seeing almost 100 GB of incoming traffic per day from these messages. I could add to the 32 GB of RAM, but probably not worth it.