128Gb with M1 ultra and 96 with M2s. But this is clearly a drawback of this SOC integrated architecture.
And Apple M2 Max results (30% faster compare to M1 Max):
julia> include("SingleSpring.jl")
27.5 GFLOPS
132.0 GB/s
7.324295 seconds (1.40 M allocations: 1.162 GiB, 0.77% gc time, 1.46% compilation time)
julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65e* (2023-01-08 06:45 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin22.1.0)
CPU: 12 Ć Apple M2 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
Threads: 8 on 8 virtual cores
Environment:
JULIA_EDITOR = code
But what is interesting it significantly better with Julia 1.9.0 beta 4:
julia> include("SingleSpring.jl")
33.5 GFLOPS
161.0 GB/s
6.690864 seconds (1.30 M allocations: 1.165 GiB, 0.87% gc time, 4.50% compilation time)
julia> versioninfo()
Julia Version 1.9.0-beta4
Commit b75ddb787ff (2023-02-07 21:53 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.5.0)
CPU: 12 Ć Apple M2 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
Threads: 8 on 8 virtual cores
Environment:
JULIA_IMAGE_THREADS = 1
and the
using BenchmarkTools
n=500000;
x=rand(n);
y=zeros(n);
function threaded_exp!(y,x)
Threads.@threads for i in eachindex(x)
@inbounds y[i]=@inline exp(x[i])
end
end
function sequential_exp!(y,x)
for i in eachindex(x)
@inbounds y[i]=@inline exp(x[i])
end
end
tseq = @belapsed sequential_exp!(y,x)
tmt = @belapsed threaded_exp!(y,x)
SpUp = tseq/tmt; Threads.nthreads()
@show tseq,tmt,SpUp;
gives me:
(tseq, tmt, SpUp) = (0.000863083, 0.000123042, 7.014539750654248)
(0.000863083, 0.000123042, 7.014539750654248)
julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65e* (2023-01-08 06:45 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin22.1.0)
CPU: 12 Ć Apple M2 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
Threads: 8 on 8 virtual cores
Thanks for the feedback : the bandwidth increases again !
I would launch the SingleSpring.jl
test twice to ensure that no compilation is included in the timing.
Is the GLMakie animation smooth ?
Yes, it is smooth.
Iām working with more real Julia code, which generally uses a lot of memory and is 2x-4x faster than my Intel i9 2019 MacBook Pro (without proving ). Just CPU, mostly Float32 Flux operations, and Intel i9 Macbook is significantly hotter and noisier for the case.
Hi @ndinsmore !
Nothing special, just Metal.jl. The only important thing is creating the arrays memory aligned so that you can use the same memory region in CPU and GPU.