Compiling Julia using LTO+PGO

The performance gain in LLVM is likely negligible. I tried compiling all julia’s dependencies with -march=zenvr2 using GCC 10 and then benchmarked precompile times of LLVM.jl a few times.

With generic binaries for Julia:

11.141739 seconds (1.94 M allocations: 133.629 MiB, 0.29% gc time, 6.16% compilation time)
11.106031 seconds (1.94 M allocations: 133.632 MiB, 0.19% gc time, 6.15% compilation time)
11.183070 seconds (1.94 M allocations: 133.614 MiB, 0.55% gc time, 5.84% compilation time)
11.084295 seconds (1.94 M allocations: 133.610 MiB, 0.55% gc time, 6.12% compilation time)

With -march=znver2:

10.917630 seconds (1.94 M allocations: 133.787 MiB, 0.30% gc time, 5.74% compilation time)
10.977101 seconds (1.94 M allocations: 133.803 MiB, 0.53% gc time, 5.79% compilation time)
11.000003 seconds (1.94 M allocations: 133.807 MiB, 0.38% gc time, 5.73% compilation time)
10.920701 seconds (1.94 M allocations: 133.804 MiB, 0.56% gc time, 6.12% compilation time)

If you want to try it yourself: https://github.com/spack/spack/pull/27280#issue-1047361063.

taskset -c 0 ./spack/opt/spack/linux-sles15-zen2/gcc-10.3.0/julia-1.7.0-rc3-47wy4knrqrzqqga56jeau55epdl5mkvz/bin/julia -e 'using Pkg; @time Pkg.precompile()'

Edit: a slightly more interesting benchmark where some code is compiled and run. The following script:

using LoopVectorization

function f!(z, x, y)
  @avx for i = eachindex(z)
    z[i] = x[i] * y[i]
  end
  z
end

f!(rand(10), rand(10), rand(10))

with LoopVectorization 0.12.98 run as follows:

julia --project -e 'using Pkg; Pkg.instantiate(); @time include("script.jl")'

Generic binaries & sysimage:

13.497693 seconds (18.24 M allocations: 984.107 MiB, 2.38% gc time, 91.57% compilation time)
13.464525 seconds (18.24 M allocations: 984.137 MiB, 2.35% gc time, 91.43% compilation time)
13.513310 seconds (18.24 M allocations: 984.137 MiB, 2.58% gc time, 91.50% compilation time)
13.485646 seconds (18.24 M allocations: 984.135 MiB, 2.41% gc time, 91.38% compilation time)

-march=zenvr2:

12.997209 seconds (18.26 M allocations: 985.012 MiB, 2.45% gc time, 91.29% compilation time)
13.035375 seconds (18.26 M allocations: 985.014 MiB, 2.42% gc time, 91.15% compilation time)
13.014416 seconds (18.26 M allocations: 985.016 MiB, 2.44% gc time, 91.15% compilation time)
13.042701 seconds (18.26 M allocations: 985.014 MiB, 2.46% gc time, 91.10% compilation time)
3 Likes