I recently updated the CPU on my desktop:
julia> versioninfo()
Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × AMD Ryzen 7 9700X 8-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, generic)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)
My GPU is a few years old, though.
julia> dev=AMDGPU.device()
┌────┬───────────────────────┬──────────┬───────────┬───────────┐
│ Id │ Name │ GCN arch │ Wavefront │ Memory │
├────┼───────────────────────┼──────────┼───────────┼───────────┤
│ 1 │ AMD Radeon RX 6600 XT │ gfx1030 │ 32 │ 7.984 GiB │
└────┴───────────────────────┴──────────┴───────────┴───────────┘
Comparing the performance of the CPU with the GPU for a simple matrix-matrix multiply, the latter is about 50% slower.
julia> A = rand(2^9, 2^9);
julia> @btime $A * $A;
319.290 μs (2 allocations: 2.00 MiB)
julia> A_d = ROCArray(A);
@btime begin
$A_d * $A_d;
AMDGPU.synchronize()
end
473.160 μs (319 allocations: 7.50 KiB)
I tried another experiment, comparing times for an LU factorization.
import LinearAlgebra: LAPACK
import AMDGPU: rocSOLVER
N = 2^10;
A = randn(N, N);
A_d = ROCArray(A);
ipiv = zeros(Int64, N);
ipiv_d = ROCArray(zeros(Int32, N));
@btime begin
A, ipiv, info = LAPACK.getrf!($A);
end
@btime begin
A, ipiv, info = rocSOLVER.getrf!($A_d, $ipiv_d)
AMDGPU.synchronize()
end
This time, the GPU was about 450% slower:
2.006 ms (1 allocation: 8.12 KiB)
9.178 ms (27 allocations: 960 bytes)
According to AMDGPU.versioninfo()
, I have rocSOLVER 3.26.0
from
/opt/rocm-6.2.1/lib/librocsolver.so
Is this relative performance to be expected from the hardware I am using?