I have always been compiling Julia from sources (checking out the relevant release- branch in the git repo), but since juliaup is so convenient, I am wondering if I should do that.
Specifically, compiling from source targets the CPU of the local machine, while AFAIK the official binaries have only a few alternative targets. I am assuming that these are the ones grabbed by juliaup. Am I losing any performance by using the latter, assuming a recent CPU?
(Incidentally, how can I query the targets of an existing image?)
To be clear, that only applies to the julia runtime (e.g. garbage collector, interface to LLVM, etc.) which is written in C/C++ and the packages in the sysimage, but the code generated by the julia compiler at runtime shouldn’t be affected by how the julia executable itself was compiled, as that’s always targeting the current CPU (unless you explicitly choose a different target with julia -C).
Based on the previous section of my reply, if you’re losing some performance it’s mainly in julia’s runtime, not really in the JIT-generated code. Playing with LTO/PGO (which is not currently easily doable in julia anyway) showed some performance gains in terms of compilation time:
I tested how efficiently the official Julia 1.10.0 binary
runs on the AMD Ryzen 9 7940HS with ZEN4 architecture.
Although it recognizes the processor as ‘znver3’,
it still generates AVX-512 (zmm) code, and this is good.
$ docker run -it --rm julia:latest
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.10.0 (2023-12-25)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 on 16 virtual cores
Environment:
JULIA_GPG = 3673DF529D9049477F76B37566E3C7DC03D6E495
JULIA_PATH = /usr/local/julia
JULIA_VERSION = 1.10.0
julia> Base.Sys.CPU_NAME
"znver3"
julia> function test_avx512()
a = rand(Float64, 1000000)
b = rand(Float64, 1000000)
c = a .* b
return sum(c)
end
# @code_native test_avx512()
test_avx512 (generic function with 1 method)
julia> function check_avx512_in_test_avx512()
# Create a temporary file
tmp_file = tempname()
# Define the expression for @code_native macro
code_native_expr = :(@code_native test_avx512())
# Redirect output to the temporary file
open(tmp_file, "w") do file
redirect_stdout(file) do
eval(code_native_expr)
end
end
# Read the content of the temporary file
code_native_output = read(tmp_file, String)
# Delete the temporary file
rm(tmp_file)
# Define patterns that indicate AVX-512 usage
avx512_patterns = ["zmm"]
# Check if any of the AVX-512 patterns are in the output
any(occursin(pattern, code_native_output) for pattern in avx512_patterns)
end
check_avx512_in_test_avx512 (generic function with 1 method)
julia> check_avx512_in_test_avx512()
true
And Julia’s support for Zen4 is pending approval for integration.
Mhm… How much performance do we lose here on a modern CPU? I would totally accept to wait for a couple of minutes while juliaup does its thing to know the sysimage is as fast as it can be.
I asked this like 3-4 years ago (here? on Slack? not sure) and was told that the binaries should leave very little performance on the table. There was not much to be gained by building from source. Since then I’ve always just used the binaries and juliaup has made that even smoother.