SIMD and compiled code

lungben · December 19, 2020, 9:30am

Hi,

Different processors have different SIMD capabilities, especially different vector sizes on which the processor can work at once.
However, there are a few points w.r.t. code compilation which I am not sure about.

If I understood correctly, Julia generates automatically SIMD code optimal to the current machine. It is able to do so because the code is compiled on the individual machine.
Pre-compiled binaries for shipping, however, would either need to be compiled for the lowest common denominator (i.e. not using SIMD capabilities of most modern CPUs to full extend) or require explicit treatment of different SIMD capabilities at runtime, correct?

What about the default Julia sysimage? Would it increase runtime speed of (some) base functionality if the sysimage is recompiled on the individual machine, taking its full SIMD capabilities into accout?

carstenbauer · December 19, 2020, 9:46am

I had a related question a few years back. There I asked whether it matters which compiler (icc vs gcc) one uses when building Julia. The answer was pretty much no.

yuyichao · December 19, 2020, 10:55am

This question asked about the system image which is not related to the compiler used to compile julia itself. The default system image is simutaniously compiled for mulitple microarchs to take advantage of the different cpu features including SIMD.

carstenbauer · December 19, 2020, 12:35pm

If I compile Julia myself, is the resulting system image also compiled for multiple microarchs?

Update: I guess what I’m really asking is how independent the compilation of the system image is from and during the compilation of Julia itself. From System Image Building · The Julia Language I take that this can be specified by a make option “during system image compilation”.

Update2: In https://github.com/JuliaLang/julia/blob/master/Make.inc I found:

# JULIA_CPU_TARGET is the JIT-only complement to MARCH. Setting it explicitly is not generally necessary,
#    since it is set equal to MARCH by default

Does this imply that the system image obtained when building Julia from source is not for multiple microarchs?

ffevotte · December 19, 2020, 1:35pm

AFAIU Julia supports an intermediate technique between the two you mention: it can bake into one sysimage several versions of the same code, compiled and optimized for different micro architectures. So that it achieves the best of both worlds: (almost-)fully optimized code and no run-time latency (for what gets baked into the system image).

As explained in the devdocs, this is what’s used for the default sysimage that is shipped with official pre-built julia releases is compiled.

If you want to use PackageCompiler to produce custom system images for your own packages, you can achieve the same effect (i.e. produce sysimages that are both portable and optimized) using the cpu_target keyword argument to create_sysimage. For example for x86_64 architectures:

create_sysimage(
    packages,
    sysimage_path = "my_sysimage.so",
    # [other kw args]
    cpu_target = "generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)"
)

lungben · December 19, 2020, 4:33pm

Thanks for your anwers @yuyichao and @ffevotte!
I assume Fortran and C compiler are working in the same way, i.e. compiling binaries for different micro-architectures, correct?

Topic		Replies	Views
Building a PC optimized for "time to first plot" Performance	86	4292	October 23, 2022
Compiling Julia from source vs release binaries General Usage question , juliaup	6	689	December 30, 2023
Are custom sysimages cross-platform for the same Julia version? New to Julia question , package-compiler , sysimage	2	356	September 17, 2020
Generating a sysimage from running julia system Performance package-compiler , sysimage	6	1358	July 21, 2022
PackageCompile system image for different computers New to Julia	6	1410	September 27, 2020

SIMD and compiled code

Related topics