I was trying the argmin vs minimum benchark that was being discussed and found that the native arm julia is almost 5x slower than running in rosetta. Is the arm codegen that much worse or is that a bug with the M1 specifically. The native version is running on fork of julia with correct feature detection on the M1.
benchmarks:
Native
julia> y = rand(100_000)
100000-element Vector{Float64}:
julia> @benchmark findmin($y)
BenchmarkTools.Trial: 9705 samples with 1 evaluation.
Range (min β¦ max): 505.000 ΞΌs β¦ 627.750 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 511.125 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 513.248 ΞΌs Β± 8.133 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
βββββ
ββββ
β
β
βββββββ ββ ββ β β
βββββββββββββββββββββββββββββββββββββββββββββ
ββββββ
β
ββββββ
β
ββ β
505 ΞΌs Histogram: log(frequency) by time 548 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark argmin($y)
BenchmarkTools.Trial: 9767 samples with 1 evaluation.
Range (min β¦ max): 504.916 ΞΌs β¦ 595.792 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 506.458 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 510.306 ΞΌs Β± 7.815 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
ββββββββββ
βββββββ β β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββ
β
β
ββββ
ββ
β
505 ΞΌs Histogram: log(frequency) by time 545 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> versioninfo()
Julia Version 1.8.0-DEV.360
Commit c70db599c2* (2021-08-17 10:32 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin20.6.0)
CPU: Apple M1
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, apple-a14)
Environment:
JULIA_NUM_THREADS = 4
JULIA_NUM_PRECOMPILE_TASKS = 4
Rosetta
julia> @benchmark findmin($y)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 144.583 ΞΌs β¦ 199.625 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 144.875 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 145.727 ΞΌs Β± 2.973 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
ββ ββ β β
ββββββββββ
β
βββββ
βββ
βββ
β
ββββββ
βββββ
β
β
β
βββ
β
ββββββββ
ββ
β
β
βββββ
βββ β
145 ΞΌs Histogram: log(frequency) by time 159 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark argmin($y)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 144.375 ΞΌs β¦ 434.167 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 145.270 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 148.420 ΞΌs Β± 7.243 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
βββββββββ βββ
βββββββββββββββ β
ββββββββββββββββββββββββββββββββββ
ββ
ββ
ββ
ββ
β
β
β
β
ββ
ββ
β
ββββββββββ β
144 ΞΌs Histogram: log(frequency) by time 170 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.