Rust
Benchmark 1: ./target/release/rust
Processing time (w/o IO): 23.541084ms
Processing time (w/o IO): 23.466582ms
Processing time (w/o IO): 23.441882ms
Processing time (w/o IO): 23.468383ms
Processing time (w/o IO): 23.629785ms
Processing time (w/o IO): 23.484583ms
Processing time (w/o IO): 23.434282ms
Processing time (w/o IO): 23.425681ms
Processing time (w/o IO): 23.420081ms
Processing time (w/o IO): 23.649885ms
Processing time (w/o IO): 23.493782ms
Processing time (w/o IO): 23.469783ms
Processing time (w/o IO): 23.456882ms
Time (mean ± σ): 39.6 ms ± 0.8 ms [User: 32.2 ms, System: 7.4 ms]
Range (min … max): 39.1 ms … 41.7 ms 10 runs
Julia
Benchmark 1: julia -O3 --project=Related -e "using Related; main()"
Processing time (w/o IO): 31 milliseconds
Processing time (w/o IO): 24 milliseconds
Processing time (w/o IO): 31 milliseconds
Processing time (w/o IO): 31 milliseconds
Processing time (w/o IO): 31 milliseconds
Processing time (w/o IO): 24 milliseconds
Processing time (w/o IO): 31 milliseconds
Processing time (w/o IO): 31 milliseconds
Processing time (w/o IO): 25 milliseconds
Processing time (w/o IO): 24 milliseconds
Processing time (w/o IO): 31 milliseconds
Processing time (w/o IO): 24 milliseconds
Processing time (w/o IO): 24 milliseconds
Time (mean ± σ): 3.351 s ± 0.047 s [User: 3.200 s, System: 0.254 s]
Range (min … max): 3.280 s … 3.447 s 10 runs
Interestingly this happens also on my machine by running in the REPL using Related; main(), restarting Julia can downgrade performance forever for that REPL session. And restarting again, can make the performance again better.
All other languages except for NodeJS (which has some less large oscillations) don’t exhibit these variations, so I’m asking myself what is special about Julia, the code run on the VM seems to be on this file https://github.com/jinyus/related_post_gen/blob/main/run.sh:
run_julia() {
echo "Running Julia" &&
cd ./julia &&
julia -e 'using Pkg; Pkg.activate("Related"); Pkg.instantiate()' &&
if [ $HYPER == 1 ]; then
capture "Julia" hyperfine -r $runs -w $warmup --show-output "julia --project=Related -e \"using Related; main()\""
else
command ${time} -f '%es %Mk' julia --project=Related -e "using Related; main()"
fi
check_output "related_posts_julia.json"
}
I’m not sure what can cause this difference, do someone have an explaination for this behaviour?
Benchmark 1: julia --startup-file=no --project=Related -e "using Related; main()"
Processing time (w/o IO): 222 milliseconds
Processing time (w/o IO): 21 milliseconds
Processing time (w/o IO): 22 milliseconds
Processing time (w/o IO): 22 milliseconds
Processing time (w/o IO): 22 milliseconds
but if you add (without [1:2]):
@setup_workload begin
json_string = read(@__DIR__()*"/../../../posts.json", String)
posts = JSON3.read(json_string, Vector{PostData})
@compile_workload begin
related(posts)
end
end
you get
Benchmark 1: julia --project=Related -e "using Related; main()"
Processing time (w/o IO): 21 milliseconds
Processing time (w/o IO): 22 milliseconds
Processing time (w/o IO): 23 milliseconds
Processing time (w/o IO): 24 milliseconds
Processing time (w/o IO): 23 milliseconds
Processing time (w/o IO): 21 milliseconds
This is what I see restarting the REPL, retriggering precompilation (by modifying the source code without real changes, e.g. by adding a space) and running main():
bob@bob-Victus-by-HP-Gaming-Laptop-15-fb0xxx:~$ julia # still needs precompilation
julia> using Related
[ Info: Precompiling Related [892d0954-52c7-4a59-ae8f-9a164aba3227]
Processing time (w/o IO): 4586 milliseconds
julia> main()
Processing time (w/o IO): 267 milliseconds # it went good, consistently these results
13262912
julia> main()
Processing time (w/o IO): 272 milliseconds
13262912
julia> main()
Processing time (w/o IO): 265 milliseconds
13262912
bob@bob-Victus-by-HP-Gaming-Laptop-15-fb0xxx:~$ julia # restart
julia> using Related
julia> main()
Processing time (w/o IO): 268 milliseconds
13262912
julia> main()
Processing time (w/o IO): 269 milliseconds
13262912
julia> main()
Processing time (w/o IO): 269 milliseconds
13262912
bob@bob-Victus-by-HP-Gaming-Laptop-15-fb0xxx:~$ julia # just added a space in the related.jl file to trigger precompilation
julia> using Related
[ Info: Precompiling Related [892d0954-52c7-4a59-ae8f-9a164aba3227]
Processing time (w/o IO): 4669 milliseconds
julia> main()
Processing time (w/o IO): 361 milliseconds # it went bad
13262912
julia> main()
Processing time (w/o IO): 364 milliseconds
13262912
julia> main()
Processing time (w/o IO): 360 milliseconds
13262912
julia> main()
Processing time (w/o IO): 360 milliseconds
13262912
bob@bob-Victus-by-HP-Gaming-Laptop-15-fb0xxx:~$ julia
julia> using Related
julia> main()
Processing time (w/o IO): 360 milliseconds # forever bad
13262912
julia> main()
Processing time (w/o IO): 361 milliseconds
13262912
julia> main()
Processing time (w/o IO): 358 milliseconds
13262912
bob@bob-Victus-by-HP-Gaming-Laptop-15-fb0xxx:~$ julia
julia> using Related
julia> main()
Processing time (w/o IO): 361 milliseconds
13262912
julia> main()
Processing time (w/o IO): 367 milliseconds
13262912
I think it would be really interesting if someone else can reproduce this kind of behaviour with the branch of @jling just posted since to me this seems like a bug in precompilation
If Julia is precompiling everytime, then it is detecting heterogeneous CPU architectures, invalidating the prior CPU cache, and trying to target the current machine.
What kind of processor are you using? Could you show us the output of versioninfo() in the REPL?
You also mentioned a virtual machine? What kind of virtual machine?
The VM is a Azure F4s v2 - 4vCPU-8GB-Ubuntu 22.04.
My computer info are
julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 12 × AMD Ryzen 5 5600H with Radeon Graphics
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
Threads: 1 on 12 virtual cores
Actually, the precompilation retriggering is done by me on purpose to show this strange behaviour, I just add a space somewhere so that the module recompiles. After that I can randomly get around 270 ms all the time or 360 ms all the time. And this is I think what happens also on that VM.
If this phenomenon was language agnostic I wouldn’t have thought about some problem with Julia itself, but on that VM the performance of all other languages are much less variant. While Julia has 30% variation, Rust has 1%. Actually, apart from the comparison between languages, having oscillations that big after retriggering precompilation seems already strange.
You’re doing the lord’s work here, thanks for going to the bottom of things. I think this might be a counterargument to the “don’t participate in micro benchmarks” – sometimes we find very deep “bugs” only be focusing on small tasks you can afford to compare assembly.
Even though I tried many methods to debug this performance issue, I’m running a bit out of ideas, so I would be really grateful If someone else more expert on this sort of analysis could share some more insights. If someone is interested, I prepared two perf profiler runs, one slow and one fast, available at https://github.com/Tortar/related_post_gen/blob/main/perf-jit-results.tar.xz