Hi all,
I just updated CUDAnative, CUDAdrv, CuArrays and GPUArrays to use the master version. Now, the following test for sustained performance of device to device memory copy shows a performance regression from about 559 GB/s (with CUDAnative v0.9.1 and without CuArrays [1]) to only about 121 GB/s (with CuArrays#master and using newly CuArrays.CuArray for the device arrays [2]). I did the tests on a NVIDIA Tesla P100; 559 GB/s is a very good performance, as also obtained with a corresponding CUDA code. Here is the test:
using CUDAdrv, CUDAnative
function memcopy!(A, B)
ix = (blockIdx().x-1) * blockDim().x + threadIdx().x
A[ix] = B[ix]
return nothing
end
nx = 128*1024^2
nt = 10000
warmup = 10
A = zeros(nx);
B = rand(nx);
A = CuArray(A);
B = CuArray(B);
nthreads = 1024
nblocks = ceil(Int, nx/nthreads)
for it = 1:nt+warmup
if (it == warmup+1) global t0 = time() end
@cuda blocks=nblocks threads=nthreads memcopy!(A, B);
end
time_s = time() - t0;
narrays = 2
GBs = (nt-warmup)/time_s/1024^3*nx*sizeof(Float64)*narrays;
println("time: $time_s; GB/s: $GBs")
Here is the output of a run with the old environment with CUDAnative v0.9.1 [1]:
> ~/julia/julia-1.0.2/bin/julia memcopy.jl
time: 35.73286414146423; GB/s: 559.1491328794802
Here is the output of a run with the new environment with CUDAnative#master, requiring now also CuArrays.CuArray [2]:
> ~/julia/julia-1.1.0/bin/julia memcopy_new.jl
time: 165.3068549633026; GB/s: 120.86613107747692
Note that the only difference between the two called codes memcopy.jl and memcopy_new.jl is the addition of CuArrays to the using statement:
> diff memcopy.jl memcopy_new.jl
1c1
< using CUDAdrv, CUDAnative
---
> using CUDAdrv, CUDAnative, CuArrays
I would imagine that the observed performance regression is due to the fact that this code requires now CuArray from CuArrays, while before it used CuArray from CUDAnative. Can you tell me how to adapt the code to get back the performance that we expect?
Thank you very much!
Sam
[1] > ~/julia/julia-1.0.2/bin/julia
(v1.0) pkg> status
Status ~/.julia/environments/v1.0/Project.toml
[c5f51814] CUDAdrv v0.8.6
[be33ccc6] CUDAnative v0.9.1
[2] > ~/julia/julia-1.1.0/bin/julia
(v1.1) pkg> status
Status ~/.julia/environments/v1.1/Project.toml
[c5f51814] CUDAdrv v3.0.0 #master (https://github.com/JuliaGPU/CUDAdrv.jl.git)
[be33ccc6] CUDAnative v2.1.0 #master (https://github.com/JuliaGPU/CUDAnative.jl.git)
[3a865a2d] CuArrays v1.0.2 #master (https://github.com/JuliaGPU/CuArrays.jl.git)
[0c68f7d7] GPUArrays v0.7.0 #master (https://github.com/JuliaGPU/GPUArrays.jl.git)