Performance discrepancy in solving sparse SPD\dense rhs between X86 and Apple M

Why do I observe a 9x difference in elapsed times when I run the following on my macbook laptop

using Random, SparseArrays, LinearAlgebra

Random.seed!(0);

A = sprand(10000,10000,5/10000) + I;
A = max.(A,A');
b = rand(10000,10);

@elapsed A\b

Sys.cpu_info()
Threads.nthreads()

The outputs are

julia> @elapsed A\b
2.142173167

julia> Sys.cpu_info()
12-element Vector{Base.Sys.CPUinfo}:
 Base.Sys.CPUinfo("Apple M2 Max", 2400, 0x0000000020acf51a, 0x0000000000000000, 0x0000000012d29062, 0x00000000584ba656, 0x0000000000000000)
...

julia> Threads.nthreads()
8
julia> versioninfo()
Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin24.0.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m2)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)

and

julia> @elapsed A\b
19.729288069

julia> Sys.cpu_info()
24-element Vector{Base.Sys.CPUinfo}:
 Base.Sys.CPUinfo("Intel(R) Xeon(R) Gold 6226 CPU @ 2.70GHz", 1200, 0x0000000043a90e78, 0x000000000001714c, 0x00000000041b9898, 0x00000000a57e168e, 0x0000000000000000)
...
julia> Threads.nthreads()
10

julia> versioninfo()
Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × Intel(R) Xeon(R) Gold 6226 CPU @ 2.70GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, cascadelake)
Threads: 10 default, 0 interactive, 5 GC (on 24 virtual cores)
Environment:
  LD_LIBRARY_PATH = /usr/pkg/intel-compilers/composer_xe_2015.3.187/compiler/lib/intel64:/usr/pkg/intel-compilers/composer_xe_2015.3.187/mpirt/lib/intel64:/usr/pkg/intel-compilers/composer_xe_2015.3.187/ipp/../compiler/lib/intel64:/usr/pkg/intel-compilers/composer_xe_2015.3.187/ipp/lib/intel64:/usr/pkg/intel-compilers/composer_xe_2015.3.187/ipp/tools/intel64/perfsys:/usr/pkg/intel-compilers/composer_xe_2015.3.187/compiler/lib/intel64:/usr/pkg/intel-compilers/composer_xe_2015.3.187/mkl/lib/intel64:/usr/pkg/intel-compilers/composer_xe_2015.3.187/tbb/lib/intel64/gcc4.1:/usr/pkg/intel-compilers/composer_xe_2015.3.187/debugger/libipt/intel64/lib:/usr/lib:/usr/ucblib:/usr/openwin/lib:/usr/local/X/lib:/usr/dt/lib:/usr/local/lib

The x86_64 is used via a slurm srun --cpus-per-task=10

Is it possible I link to obsolete system libraries instead of the Julia packages? I have configuration files from 2002 in the Linux file system…

1 Like

does using MKL change the times on intel?

1 Like

Good point!

Changing the first line to

using Random, SparseArrays, LinearAlgebra, BenchmarkTools, MKL

there was a huge improvement on the x86_64

julia> @belapsed $A\$b
2.549203957

The corresponding Apple M2 MAX, now I use BenchmarkTools.@belapsed is

julia> @belapsed $A\$b
1.168366542
1 Like