CUDA example in Julia doesn't use the GPU

I’m doing my first steps on running Julia 1.6.5 code on GPU. For some reason, it seems the GPU is not being used at all. These are the steps:

First of all, my GPU passed on the test recommended at https://cuda.juliagpu.org/stable/:

# install the package
using Pkg
Pkg.add("CUDA")
  
# smoke test (this will download the CUDA toolkit)
using CUDA
CUDA.versioninfo()

using Pkg
Pkg.test("CUDA")    # takes ~40 minutes if using 1 thread

Secondly, the below code took around 8 minutes (real time) for supposedly running on my GPU. It loads and multiplies, for 10 times, two matrices 10000 x 10000:

using CUDA
using Random
N = 10000

a_d = CuArray{Float32}(undef, (N, N))
b_d = CuArray{Float32}(undef, (N, N))
c_d = CuArray{Float32}(undef, (N, N))

for i in 1:10
    global a_d = randn(N, N)
    global b_d = randn(N, N)

    global c_d = a_d * b_d
end

global a_d = nothing
global b_d = nothing
global c_d = nothing
GC.gc()

Outcome on terminal as follows:

(base) ciro@ciro-G3-3500:~/projects/julia/cuda$ time julia cuda-gpu.jl

real    8m13,016s
user    50m39,146s
sys 13m16,766s

Then, an equivalent code for the CPU is run. Execution time is equivalent:

using Random
N = 10000

for i in 1:10
    a = randn(N, N)
    b = randn(N, N)

    c = a * b
end

Execution:

(base) ciro@ciro-G3-3500:~/projects/julia/cuda$ time julia cuda-cpu.jl

real    8m2,689s 
user    50m9,567s 
sys 13m3,738s

Moreover, by following the info on NVTOP screen command, it is weird to see the GPU memory and cores being loaded/unloaded accordingly, besides still using the same 800% CPUs (or eight cores) of my regular CPU, which is the same usage the CPU-version has.

Any hint is greatly appreciated. Thanks.

You’re just assigning a new CPU array to the same variable names you assigned the GPU arrays to. Your second a_d variable is unrelated to your first a_d variable.

To run rand on the GPU I think you can use rand!(a_d). But the general idea is you have to assign the array and then operate on it. You can use a broadcast a .= x or a function like rand! that has a CUDA.jl version that will dispatch on your gpu array.

1 Like

It just works. Thanks.