Compiling Julia from source vs release binaries

I have always been compiling Julia from sources (checking out the relevant release- branch in the git repo), but since juliaup is so convenient, I am wondering if I should do that.

Specifically, compiling from source targets the CPU of the local machine, while AFAIK the official binaries have only a few alternative targets. I am assuming that these are the ones grabbed by juliaup. Am I losing any performance by using the latter, assuming a recent CPU?

(Incidentally, how can I query the targets of an existing image?)

To be clear, that only applies to the julia runtime (e.g. garbage collector, interface to LLVM, etc.) which is written in C/C++ and the packages in the sysimage, but the code generated by the julia compiler at runtime shouldn’t be affected by how the julia executable itself was compiled, as that’s always targeting the current CPU (unless you explicitly choose a different target with julia -C).

Based on the previous section of my reply, if you’re losing some performance it’s mainly in julia’s runtime, not really in the JIT-generated code. Playing with LTO/PGO (which is not currently easily doable in julia anyway) showed some performance gains in terms of compilation time:

5 Likes

I tested how efficiently the official Julia 1.10.0 binary
runs on the AMD Ryzen 9 7940HS with ZEN4 architecture.
Although it recognizes the processor as ‘znver3’,
it still generates AVX-512 (zmm) code, and this is good.

$ docker run -it --rm   julia:latest 
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0 (2023-12-25)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
  Threads: 1 on 16 virtual cores
Environment:
  JULIA_GPG = 3673DF529D9049477F76B37566E3C7DC03D6E495
  JULIA_PATH = /usr/local/julia
  JULIA_VERSION = 1.10.0

julia> Base.Sys.CPU_NAME
"znver3"

julia> function test_avx512()
           a = rand(Float64, 1000000)
           b = rand(Float64, 1000000)
           c = a .* b
           return sum(c)
       end
       # @code_native test_avx512()
test_avx512 (generic function with 1 method)

julia> function check_avx512_in_test_avx512()
                  # Create a temporary file
                  tmp_file = tempname()
                  # Define the expression for @code_native macro
                  code_native_expr = :(@code_native test_avx512())
                  # Redirect output to the temporary file
                  open(tmp_file, "w") do file
                      redirect_stdout(file) do
                          eval(code_native_expr)
                      end
                  end
                  # Read the content of the temporary file
                  code_native_output = read(tmp_file, String)
                  # Delete the temporary file
                  rm(tmp_file)
                  # Define patterns that indicate AVX-512 usage
                  avx512_patterns = ["zmm"]
                  # Check if any of the AVX-512 patterns are in the output
                  any(occursin(pattern, code_native_output) for pattern in avx512_patterns)
              end
check_avx512_in_test_avx512 (generic function with 1 method)

julia> check_avx512_in_test_avx512()
true


And Julia’s support for Zen4 is pending approval for integration.

2 Likes

It also affects the (pre)compiled code that comes bundled in the sysimage.

9 Likes

Mhm… How much performance do we lose here on a modern CPU? I would totally accept to wait for a couple of minutes while juliaup does its thing to know the sysimage is as fast as it can be.

Julia is compiled with multi-versioning so it should not be much missing there.

3 Likes

I asked this like 3-4 years ago (here? on Slack? not sure) and was told that the binaries should leave very little performance on the table. There was not much to be gained by building from source. Since then I’ve always just used the binaries and juliaup has made that even smoother.

1 Like