Hi all,
I just updated CUDAnative
, CUDAdrv
, CuArrays
and GPUArrays
to use the master version. Now, the following test for sustained performance of device to device memory copy shows a performance regression from about 559 GB/s (with CUDAnative
v0.9.1 and without CuArrays
[1]) to only about 121 GB/s (with CuArrays#master
and using newly CuArrays.CuArray
for the device arrays [2]). I did the tests on a NVIDIA Tesla P100; 559 GB/s is a very good performance, as also obtained with a corresponding CUDA code. Here is the test:
using CUDAdrv, CUDAnative
function memcopy!(A, B)
ix = (blockIdx().x-1) * blockDim().x + threadIdx().x
A[ix] = B[ix]
return nothing
end
nx = 128*1024^2
nt = 10000
warmup = 10
A = zeros(nx);
B = rand(nx);
A = CuArray(A);
B = CuArray(B);
nthreads = 1024
nblocks = ceil(Int, nx/nthreads)
for it = 1:nt+warmup
if (it == warmup+1) global t0 = time() end
@cuda blocks=nblocks threads=nthreads memcopy!(A, B);
end
time_s = time() - t0;
narrays = 2
GBs = (nt-warmup)/time_s/1024^3*nx*sizeof(Float64)*narrays;
println("time: $time_s; GB/s: $GBs")
Here is the output of a run with the old environment with CUDAnative
v0.9.1 [1]:
> ~/julia/julia-1.0.2/bin/julia memcopy.jl
time: 35.73286414146423; GB/s: 559.1491328794802
Here is the output of a run with the new environment with CUDAnative#master
, requiring now also CuArrays.CuArray
[2]:
> ~/julia/julia-1.1.0/bin/julia memcopy_new.jl
time: 165.3068549633026; GB/s: 120.86613107747692
Note that the only difference between the two called codes memcopy.jl and memcopy_new.jl is the addition of CuArrays
to the using
statement:
> diff memcopy.jl memcopy_new.jl
1c1
< using CUDAdrv, CUDAnative
---
> using CUDAdrv, CUDAnative, CuArrays
I would imagine that the observed performance regression is due to the fact that this code requires now CuArray
from CuArrays
, while before it used CuArray
from CUDAnative
. Can you tell me how to adapt the code to get back the performance that we expect?
Thank you very much!
Sam
[1] > ~/julia/julia-1.0.2/bin/julia
(v1.0) pkg> status
Status ~/.julia/environments/v1.0/Project.toml
[c5f51814] CUDAdrv v0.8.6
[be33ccc6] CUDAnative v0.9.1
[2] > ~/julia/julia-1.1.0/bin/julia
(v1.1) pkg> status
Status ~/.julia/environments/v1.1/Project.toml
[c5f51814] CUDAdrv v3.0.0 #master (https://github.com/JuliaGPU/CUDAdrv.jl.git)
[be33ccc6] CUDAnative v2.1.0 #master (https://github.com/JuliaGPU/CUDAnative.jl.git)
[3a865a2d] CuArrays v1.0.2 #master (https://github.com/JuliaGPU/CuArrays.jl.git)
[0c68f7d7] GPUArrays v0.7.0 #master (https://github.com/JuliaGPU/GPUArrays.jl.git)