Performance discrepancy in solving sparse SPD\dense rhs between X86 and Apple M

Why do I observe a 9x difference in elapsed times when I run the following on my macbook laptop

using Random, SparseArrays, LinearAlgebra

Random.seed!(0);

A = sprand(10000,10000,5/10000) + I;
A = max.(A,A');
b = rand(10000,10);

@elapsed A\b

Sys.cpu_info()
Threads.nthreads()

The outputs are

julia> @elapsed A\b
2.142173167

julia> Sys.cpu_info()
12-element Vector{Base.Sys.CPUinfo}:
 Base.Sys.CPUinfo("Apple M2 Max", 2400, 0x0000000020acf51a, 0x0000000000000000, 0x0000000012d29062, 0x00000000584ba656, 0x0000000000000000)
...

julia> Threads.nthreads()
8
julia> versioninfo()
Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin24.0.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m2)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)

and

julia> @elapsed A\b
19.729288069

julia> Sys.cpu_info()
24-element Vector{Base.Sys.CPUinfo}:
 Base.Sys.CPUinfo("Intel(R) Xeon(R) Gold 6226 CPU @ 2.70GHz", 1200, 0x0000000043a90e78, 0x000000000001714c, 0x00000000041b9898, 0x00000000a57e168e, 0x0000000000000000)
...
julia> Threads.nthreads()
10

julia> versioninfo()
Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × Intel(R) Xeon(R) Gold 6226 CPU @ 2.70GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, cascadelake)
Threads: 10 default, 0 interactive, 5 GC (on 24 virtual cores)
Environment:
  LD_LIBRARY_PATH = /usr/pkg/intel-compilers/composer_xe_2015.3.187/compiler/lib/intel64:/usr/pkg/intel-compilers/composer_xe_2015.3.187/mpirt/lib/intel64:/usr/pkg/intel-compilers/composer_xe_2015.3.187/ipp/../compiler/lib/intel64:/usr/pkg/intel-compilers/composer_xe_2015.3.187/ipp/lib/intel64:/usr/pkg/intel-compilers/composer_xe_2015.3.187/ipp/tools/intel64/perfsys:/usr/pkg/intel-compilers/composer_xe_2015.3.187/compiler/lib/intel64:/usr/pkg/intel-compilers/composer_xe_2015.3.187/mkl/lib/intel64:/usr/pkg/intel-compilers/composer_xe_2015.3.187/tbb/lib/intel64/gcc4.1:/usr/pkg/intel-compilers/composer_xe_2015.3.187/debugger/libipt/intel64/lib:/usr/lib:/usr/ucblib:/usr/openwin/lib:/usr/local/X/lib:/usr/dt/lib:/usr/local/lib

The x86_64 is used via a slurm srun --cpus-per-task=10

Is it possible I link to obsolete system libraries instead of the Julia packages? I have configuration files from 2002 in the Linux file system…

does using MKL change the times on intel?

Good point!

Changing the first line to

using Random, SparseArrays, LinearAlgebra, BenchmarkTools, MKL

there was a huge improvement on the x86_64

julia> @belapsed $A\$b
2.549203957

The corresponding Apple M2 MAX, now I use BenchmarkTools.@belapsed is

julia> @belapsed $A\$b
1.168366542