Hi,
I wonder about the different outputs I obtain from @code_llvm with the same Julia version (1.10.rc2) on different architectures (arm vs x86). The following script:
f(a,b) = a .+ b
f (generic function with 1 method)
@code_llvm debuginfo=:none f((1.,2.,3.,4.),(5.,6.,7.,8.))
returns this on x86 (Intel 13900K)
define void @julia_f_117([4 x double]* noalias nocapture noundef nonnull sret([4 x double]) align 8 dereferenceable(32) %0, [4 x double]* nocapture noundef nonnull readonly align 8 dereferenceable(32) %1, [4 x double]* nocapture noundef nonnull readonly align 8 dereferenceable(32) %2) #0 {
top:
%3 = bitcast [4 x double]* %1 to <4 x double>*
%4 = load <4 x double>, <4 x double>* %3, align 8
%5 = bitcast [4 x double]* %2 to <4 x double>*
%6 = load <4 x double>, <4 x double>* %5, align 8
%7 = fadd <4 x double> %4, %6
%8 = bitcast [4 x double]* %0 to <4 x double>*
store <4 x double> %7, <4 x double>* %8, align 8
ret void
}
and that on arm (apple m1 max)
define void @julia_f_142([4 x double]* noalias nocapture noundef nonnull sret([4 x double]) align 8 dereferenceable(32) %0, [4 x double]* nocapture noundef nonnull readonly align 8 dereferenceable(32) %1, [4 x double]* nocapture noundef nonnull readonly align 8 dereferenceable(32) %2) #0 {
top:
%3 = getelementptr inbounds [4 x double], [4 x double]* %1, i64 0, i64 2
%4 = getelementptr inbounds [4 x double], [4 x double]* %2, i64 0, i64 2
%5 = bitcast [4 x double]* %1 to <2 x double>*
%6 = load <2 x double>, <2 x double>* %5, align 8
%7 = bitcast [4 x double]* %2 to <2 x double>*
%8 = load <2 x double>, <2 x double>* %7, align 8
%9 = fadd <2 x double> %6, %8
%10 = bitcast [4 x double]* %0 to <2 x double>*
store <2 x double> %9, <2 x double>* %10, align 8
%newstruct.sroa.3.0..sroa_idx9 = getelementptr inbounds [4 x double], [4 x double]* %0, i64 0, i64 2
%11 = bitcast double* %3 to <2 x double>*
%12 = load <2 x double>, <2 x double>* %11, align 8
%13 = bitcast double* %4 to <2 x double>*
%14 = load <2 x double>, <2 x double>* %13, align 8
%15 = fadd <2 x double> %12, %14
%16 = bitcast double* %newstruct.sroa.3.0..sroa_idx9 to <2 x double>*
store <2 x double> %15, <2 x double>* %16, align 8
ret void
}
The versioninfo()
outputs on both machines are given below
versioninfo() output on x86
julia> versioninfo()
Julia Version 1.10.0-rc2
Commit dbb9c46795b (2023-12-03 15:25 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 32 × 13th Gen Intel(R) Core(TM) i9-13900K
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, goldmont)
Threads: 1 on 32 virtual cores
versioninfo() output on apple silicon
Julia Version 1.10.0-rc2
Commit dbb9c46795b (2023-12-03 15:25 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 10 × Apple M1 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 on 8 virtual cores
I was expecting to see the same output and a difference with ‘@code_native’.
Is it because Julia assume a 128bit SIMD width on m1 chips ?
P.S. I got the example from the nice video https://youtu.be/W1hXttRmuks?si=49UMwwkVqPSFird_