I am playing around with passing of custom structures containing arrays into CUDA kernels. In order to be able to access these structure’s arrays inside the kernels, I convert them to CuDeviceArrays using cudaconvert
function. However, sometimes, I want to see their content. Unfortunately, Base.collect
function does not work with CuDeviceArrays and I can not explicitly extract their content from the device.
I decided to write my own implementation of Base.collect
. The idea is to create an empty CuArray, call a CUDA kernel where I transfer the content of CuDeviceArray to the CuArray, element by element, and, finally, collect the resulting CuArray. The code below shows my implementation.
Surprisingly, the obtained results depends on preceding history. The collect function returns the desired content of the CuDeviceArray only if previously I create another CuDeviceArray (which is later never used). If I want to collect CuDeviceArray multiple times, I have to create preventively additional set of CuDeviceArrays.
Can you please explain me what is going on?
using CUDA
function set_kernel(dest, src)
id = (blockIdx().x - 1) * blockDim().x + threadIdx().x
stride = blockDim().x * gridDim().x
for i=id:stride:length(src)
dest[i] = src[i]
end
return nothing
end
function Base.collect(src::CuDeviceArray)
dest = CUDA.zeros(size(src))
N = length(src)
nthreads = min(N, 256)
nblocks = cld(N, nthreads)
@cuda threads=nthreads blocks=nblocks set_kernel(dest, src)
return collect(dest)
end
# xxx = cudaconvert(CUDA.zeros(1))
# yyy = cudaconvert(CUDA.zeros(1))
# zzz = cudaconvert(CUDA.zeros(1))
a = cudaconvert(CUDA.ones(10))
@show collect(a)
@show collect(a)
@show collect(a)
# xxx, yyy, zzz are commented:
# collect(a) = Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
# collect(a) = Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
# collect(a) = Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
# xxx is uncommented; yyy, zzz are commented:
# collect(a) = Float32[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
# collect(a) = Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
# collect(a) = Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
# xxx, yyy are uncommented; zzz is commented:
# collect(a) = Float32[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
# collect(a) = Float32[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
# collect(a) = Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
# xxx, yyy, zzz are uncommented:
# collect(a) = Float32[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
# collect(a) = Float32[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
# collect(a) = Float32[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]