Startup time of 1000 packages – 53% slower in Julia 1.12 vs 1.10

Sure, I’ve just been doing things like:

~$ hyperfine 'julia +1.11 --startup-file=no -e "using Base"' 'julia +1.11 --startup-file=no -e "using Downloads"'
Benchmark 1: julia +1.11 --startup-file=no -e "using Base"
  Time (mean ± σ):     130.6 ms ±   5.8 ms    [User: 143.8 ms, System: 119.6 ms]
  Range (min … max):   124.6 ms … 148.0 ms    21 runs
 
Benchmark 2: julia +1.11 --startup-file=no -e "using Downloads"
  Time (mean ± σ):     198.2 ms ±   5.0 ms    [User: 2270.8 ms, System: 142.3 ms]
  Range (min … max):   191.9 ms … 206.5 ms    14 runs
 
Summary
  julia +1.11 --startup-file=no -e "using Base" ran
    1.52 ± 0.08 times faster than julia +1.11 --startup-file=no -e "using Downloads"

Which I take to indicate that using Downloadstakes ~70ms in practice, as opposed to the ~40ms indicated by @time or ~200ms indicated by @time_imports.

3 Likes

Network options was removed from the sysimage so I think this is expected, High package load time/ allocations on Julia 1.11 · Issue #40 · JuliaLang/NetworkOptions.jl · GitHub.

3 Likes
ufechner@framework:~$ hyperfine 'julia +1.10 --startup-file=no -e "using Downloads"' 'julia +1.11 --startup-file=no -e "using Downloads"'
Benchmark 1: julia +1.10 --startup-file=no -e "using Downloads"
  Time (mean ± σ):     102.2 ms ±   1.5 ms    [User: 52.6 ms, System: 59.1 ms]
  Range (min … max):   100.1 ms … 105.0 ms    28 runs
 
Benchmark 2: julia +1.11 --startup-file=no -e "using Downloads"
  Time (mean ± σ):     114.8 ms ±   2.5 ms    [User: 290.9 ms, System: 49.2 ms]
  Range (min … max):   111.1 ms … 121.6 ms    26 runs
 
Summary
  julia +1.10 --startup-file=no -e "using Downloads" ran
    1.12 ± 0.03 times faster than julia +1.11 --startup-file=no -e "using Downloads"

And I get only 12% difference between Julia 1.11 and 1.10.

Because most of the time in that benchmark is from starting Julia itself.

1 Like

I don’t think that is the reason. Even if I load a very large package where the startup time of Julia plays a very insignificant role I get similar results:

Benchmark 1: julia +1.10 --startup-file=no -e "using KiteModels"
  Time (mean ± σ):      5.530 s ±  0.015 s    [User: 5.168 s, System: 0.928 s]
  Range (min … max):    5.507 s …  5.549 s    10 runs
 
Benchmark 2: julia +1.11 --startup-file=no -e "using KiteModels"
  Time (mean ± σ):      6.256 s ±  0.015 s    [User: 6.307 s, System: 0.515 s]
  Range (min … max):    6.222 s …  6.273 s    10 runs
 
Summary
  julia +1.10 --startup-file=no -e "using KiteModels" ran
    1.13 ± 0.00 times faster than julia +1.11 --startup-file=no -e "using KiteModels"

In this example, Julia 1.10 is 13% faster than Julia 1.11. But this is only the package load time and does not include the pre-compilation time. Well, for me the load time is much more relevant than the pre-compilation time.

1 Like

Another data point. Time-to-first-gradient with Mooncake + DifferentiationInterface is 40% higher in Julia 1.11 vs 1.10:

> hyperfine --warmup 3 'julia +1.10 --startup-file=no -e "using Mooncake, DifferentiationInterface; f(x) = sum(x .^2); gradient(f, AutoMooncake(config=nothing), [1., 2., -1.4])"' 'julia --startup-file=no -e "using Mooncake, DifferentiationInterface; f(x) = sum(x .^2); gradient(f, AutoMooncake(config=nothing), [1., 2., -1.4])"'
Benchmark 1: julia +1.10 --startup-file=no -e "using Mooncake, DifferentiationInterface; f(x) = sum(x .^2); gradient(f, AutoMooncake(config=nothing), [1., 2., -1.4])"
  Time (mean ± σ):     43.036 s ±  0.159 s    [User: 42.316 s, System: 0.651 s]
  Range (min … max):   42.892 s … 43.365 s    10 runs
 
Benchmark 2: julia --startup-file=no -e "using Mooncake, DifferentiationInterface; f(x) = sum(x .^2); gradient(f, AutoMooncake(config=nothing), [1., 2., -1.4])"
  Time (mean ± σ):     60.154 s ±  0.400 s    [User: 59.122 s, System: 0.876 s]
  Range (min … max):   59.707 s … 61.109 s    10 runs
 
Summary
  julia +1.10 --startup-file=no -e "using Mooncake, DifferentiationInterface; f(x) = sum(x .^2); gradient(f, AutoMooncake(config=nothing), [1., 2., -1.4])" ran
    1.40 ± 0.01 times faster than julia --startup-file=no -e "using Mooncake, DifferentiationInterface; f(x) = sum(x .^2); gradient(f, AutoMooncake(config=nothing), [1., 2., -1.4])"

Here julia runs Julia 1.11.5. For some reason julia +1.11 says ERROR: 1.11 is not installed. Please run juliaup add 1.11 to install channel or version. I opened an issue about this.

As a side note, computing the first gradient of a basic quadratic function of three variables (3D vector) takes one minute??

1 Like

You have 1.11.5 installed in the release channel (arguably that could be improved - maybe open an issue in juliaup).

I find that much worse than load/compile time is the regression in TTFP (at least in this case) where 1.11 is ~2.5 slower and 1.12 ~4 time slower than 1.10. And that example is for an exact same command that has been pre-compiled with PrecompileTools.

7 Likes

Anybody have any idea why the user+system or user mean time can be larger, even >10x, than the actual time? I looked up the relevant hyperfine issues (links below) but the possible explanation there is multithreading, which I don’t know where it’s happening. How many cores are there and do imports actually use multiple even when the Julia process defaults to single-threaded?
Document hyperfine’s output fields · Issue #443 · sharkdp/hyperfine
Larger system time than runtime reported · Issue #515 · sharkdp/hyperfine

Measuring that by subtracting the using Base process’s runtime from the using Downloads process’s runtime (or rather the intervals) assumes that starting and exiting the process takes the same time. That seems reasonable for the start because of the identical state, but exiting doesn’t occur with the same state and can take an arbitrarily long time e.g. running finalizers julia --startup-file=no -E "obj = Ref(3); finalizer(x -> Libc.systemsleep(x[]), obj)". No idea how to benchmark exit() in isolation though.

Following up on some of the comments here regarding consistent code performance across Julia versions — I wanted to share some observations from our own experience with Rocket.jl, where we still support versions as far back as Julia 1.3.

We’re running exactly the same test suite and code across all supported Julia versions, with the same payload, and we’ve noticed similar patterns to those described by Fons. For example, we’ve seen test runtimes improve significantly between 1.3 and 1.9 — dropping from nearly 5 minutes to about 3.5 minutes. A great improvement! However, starting with 1.11, the performance regresses substantially — test time jumps back to 4.5 minutes, and the nightly build performs even worse than 1.3.

You can see an even more drastic version of this in the test suite for RxInfer.jl, where we support Julia 1.10 and above. There, the jump from 1.10 to 1.11 is dramatic.

Similar issues appear in our RxInferExamples.jl repo as well:

Seeing test runtimes go from ~17 minutes to ~23 minutes is quite concerning. The logs suggest that most of this additional time is spent in compilation too.

17 Likes

It’s pretty negligible. If you want to skip finalising, you can ccall(:exit, Cvoid, (Cint,), 0) for the sake of testing (POSIX only).

1 Like