Funny Benchmark with Julia (no longer) at the bottom

I monitored the results of the last runs of the benchmark on the virtual machine, see e.g. https://github.com/jinyus/related_post_gen/blob/main/raw_results.md or https://github.com/jinyus/related_post_gen/blob/697198a2bab8c9b9e2f90fe6569cc6afbc80f772/raw_results.md, what I see is strange to me. Julia is the more performance-inconsistent between runs compared with all the other languages e.g. compare the results of Rust with the ones of Julia for the 5k posts run:

Rust

Benchmark 1: ./target/release/rust
Processing time (w/o IO): 23.541084ms
Processing time (w/o IO): 23.466582ms
Processing time (w/o IO): 23.441882ms
Processing time (w/o IO): 23.468383ms
Processing time (w/o IO): 23.629785ms
Processing time (w/o IO): 23.484583ms
Processing time (w/o IO): 23.434282ms
Processing time (w/o IO): 23.425681ms
Processing time (w/o IO): 23.420081ms
Processing time (w/o IO): 23.649885ms
Processing time (w/o IO): 23.493782ms
Processing time (w/o IO): 23.469783ms
Processing time (w/o IO): 23.456882ms
  Time (mean ± σ):      39.6 ms ±   0.8 ms    [User: 32.2 ms, System: 7.4 ms]
  Range (min … max):    39.1 ms …  41.7 ms    10 runs

Julia

Benchmark 1: julia -O3 --project=Related -e "using Related; main()"
Processing time (w/o IO): 31 milliseconds
Processing time (w/o IO): 24 milliseconds
Processing time (w/o IO): 31 milliseconds
Processing time (w/o IO): 31 milliseconds
Processing time (w/o IO): 31 milliseconds
Processing time (w/o IO): 24 milliseconds
Processing time (w/o IO): 31 milliseconds
Processing time (w/o IO): 31 milliseconds
Processing time (w/o IO): 25 milliseconds
Processing time (w/o IO): 24 milliseconds
Processing time (w/o IO): 31 milliseconds
Processing time (w/o IO): 24 milliseconds
Processing time (w/o IO): 24 milliseconds
  Time (mean ± σ):      3.351 s ±  0.047 s    [User: 3.200 s, System: 0.254 s]
  Range (min … max):    3.280 s …  3.447 s    10 runs

Interestingly this happens also on my machine by running in the REPL using Related; main(), restarting Julia can downgrade performance forever for that REPL session. And restarting again, can make the performance again better.

All other languages except for NodeJS (which has some less large oscillations) don’t exhibit these variations, so I’m asking myself what is special about Julia, the code run on the VM seems to be on this file https://github.com/jinyus/related_post_gen/blob/main/run.sh:

run_julia() {
    echo "Running Julia" &&
        cd ./julia &&
        julia -e 'using Pkg; Pkg.activate("Related"); Pkg.instantiate()' &&
        if [ $HYPER == 1 ]; then

            capture "Julia" hyperfine -r $runs -w $warmup --show-output "julia --project=Related -e \"using Related; main()\""
        else
            command ${time} -f '%es %Mk' julia --project=Related -e "using Related; main()"

        fi

    check_output "related_posts_julia.json"
}

I’m not sure what can cause this difference, do someone have an explaination for this behaviour?

2 Likes

here’s something even more funny
if you add

@compile_workload begin
    main()
end

you get

Benchmark 1: julia --startup-file=no --project=Related -e "using Related; main()"
Processing time (w/o IO): 222 milliseconds
Processing time (w/o IO): 21 milliseconds
Processing time (w/o IO): 22 milliseconds
Processing time (w/o IO): 22 milliseconds
Processing time (w/o IO): 22 milliseconds

but if you add (without [1:2]):

@setup_workload begin
    json_string = read(@__DIR__()*"/../../../posts.json", String)
    posts = JSON3.read(json_string, Vector{PostData})

    @compile_workload begin
        related(posts)
    end
end

you get

Benchmark 1: julia --project=Related -e "using Related; main()"
Processing time (w/o IO): 21 milliseconds
Processing time (w/o IO): 22 milliseconds
Processing time (w/o IO): 23 milliseconds
Processing time (w/o IO): 24 milliseconds
Processing time (w/o IO): 23 milliseconds
Processing time (w/o IO): 21 milliseconds
1 Like
2 Likes

Still I don’t see deterministic results on my machine, can Precompilation does not produce a deterministic result · Issue #25900 · JuliaLang/julia · GitHub be the cause?

This is what I see restarting the REPL, retriggering precompilation (by modifying the source code without real changes, e.g. by adding a space) and running main():

bob@bob-Victus-by-HP-Gaming-Laptop-15-fb0xxx:~$ julia # still needs precompilation

julia> using Related
[ Info: Precompiling Related [892d0954-52c7-4a59-ae8f-9a164aba3227]
Processing time (w/o IO): 4586 milliseconds
julia> main()
Processing time (w/o IO): 267 milliseconds # it went good, consistently these results
13262912
julia> main()
Processing time (w/o IO): 272 milliseconds
13262912
julia> main()
Processing time (w/o IO): 265 milliseconds
13262912
bob@bob-Victus-by-HP-Gaming-Laptop-15-fb0xxx:~$ julia # restart

julia> using Related
julia> main()
Processing time (w/o IO): 268 milliseconds
13262912
julia> main()
Processing time (w/o IO): 269 milliseconds
13262912
julia> main()
Processing time (w/o IO): 269 milliseconds
13262912

bob@bob-Victus-by-HP-Gaming-Laptop-15-fb0xxx:~$ julia # just added a space in the related.jl file to trigger precompilation

julia> using Related
[ Info: Precompiling Related [892d0954-52c7-4a59-ae8f-9a164aba3227]
Processing time (w/o IO): 4669 milliseconds
julia> main()
Processing time (w/o IO): 361 milliseconds # it went bad
13262912
julia> main()
Processing time (w/o IO): 364 milliseconds
13262912
julia> main()
Processing time (w/o IO): 360 milliseconds
13262912
julia> main()
Processing time (w/o IO): 360 milliseconds
13262912

bob@bob-Victus-by-HP-Gaming-Laptop-15-fb0xxx:~$ julia

julia> using Related
julia> main()
Processing time (w/o IO): 360 milliseconds # forever bad
13262912
julia> main()
Processing time (w/o IO): 361 milliseconds
13262912
julia> main()
Processing time (w/o IO): 358 milliseconds
13262912

bob@bob-Victus-by-HP-Gaming-Laptop-15-fb0xxx:~$ julia

julia> using Related
julia> main()
Processing time (w/o IO): 361 milliseconds
13262912
julia> main()
Processing time (w/o IO): 367 milliseconds
13262912

which is intriguing

I think it would be really interesting if someone else can reproduce this kind of behaviour with the branch of @jling just posted since to me this seems like a bug in precompilation

If Julia is precompiling everytime, then it is detecting heterogeneous CPU architectures, invalidating the prior CPU cache, and trying to target the current machine.

What kind of processor are you using? Could you show us the output of versioninfo() in the REPL?

You also mentioned a virtual machine? What kind of virtual machine?

The VM is a Azure F4s v2 - 4vCPU-8GB-Ubuntu 22.04.

My computer info are

julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × AMD Ryzen 5 5600H with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
  Threads: 1 on 12 virtual cores

Actually, the precompilation retriggering is done by me on purpose to show this strange behaviour, I just add a space somewhere so that the module recompiles. After that I can randomly get around 270 ms all the time or 360 ms all the time. And this is I think what happens also on that VM.

If this phenomenon was language agnostic I wouldn’t have thought about some problem with Julia itself, but on that VM the performance of all other languages are much less variant. While Julia has 30% variation, Rust has 1%. Actually, apart from the comparison between languages, having oscillations that big after retriggering precompilation seems already strange.

I reproduced on another computer, so I opened an issue for this Oscillations in performance after different precompilation runs · Issue #51988 · JuliaLang/julia · GitHub

10 Likes

You’re doing the lord’s work here, thanks for going to the bottom of things. I think this might be a counterargument to the “don’t participate in micro benchmarks” – sometimes we find very deep “bugs” only be focusing on small tasks you can afford to compare assembly.

6 Likes

Even though I tried many methods to debug this performance issue, I’m running a bit out of ideas, so I would be really grateful If someone else more expert on this sort of analysis could share some more insights. If someone is interested, I prepared two perf profiler runs, one slow and one fast, available at https://github.com/Tortar/related_post_gen/blob/main/perf-jit-results.tar.xz

P.S: thanks @jling for the kind words :slight_smile:

2 Likes