You really need to consider that julia has a “fat” runtime with significant warmup time for your deployment strategy.
There are other runtimes that are in similar situations. JVM comes to mind; or e.g. scalac and sbt. NodeJS as well.
This means:
@elapsedwithout warmup is questionable for benchmarking. It measures something important, but you’d be a fool to naively extrapolate that “handling one item takes 3 seconds, handling 100k items should take forever” when it does in fact take 5 seconds (your numbers not mine).- Typical JVM / JS runtimes use a tiered JIT, so you’ll observe a maybe 1e2 or 1e3 factor, not the 1e5 your case had. That’s a question of resources.
- You know, the same applies to C, albeit to a lesser degree. Firing up a process and then adding two numbers takes hundreds of microseconds, instead of a fraction of a nanosecond. Just check
hyperfine -N '/bin/true'. - Starting a julia process to generate a single plot is therefore stupid. Start a julia process to generate 1k plots (batch processing). Start a julia process that waits for commands to generate more plots, instead of immediately shutting down (server mode).
- Jupyter is your friend.
- Precompilation is also your friend, but that might be too annoying to set up?
Now, luckily julia runtimes are in fact more reproducible / predictable than e.g. java or JS. If you do a plot of “number of similar items to process” against “time to do them all”, then this will be a pretty affine function after the first handful of items. This is simple!
Tiered compilation in JVM or JS will introduce many more regimes instead (“oh, look, C2 kicks in here, unless you suffer from profile pollution because you ran that other code before…”).
The lack of tiered compilation is a weakness of the julia runtime – you’re paying the full compilation effort before the first run. Improved reproducibility of performance is really the only upside of this shortcoming.
PS. Something analogous to profile pollution exists in julia as well, but it’s a pretty rare corner case. The issue is that equality of datatypes can be a very complex concept in julia. So julia will do the memory layout for the first description of a datatype, and then look it up in a cache later. Hence, each datatype has one unique memory layout in each julia session, but its details can depend on which description was first encountered. You can see that with types in the style of Vector{Tuple{Union{Missing, Int}, Int}} vs Vector{Union{Tuple{Int,Int}, Tuple{Missing,Int}}}.