Issue
Let’s consider the following simple CUDAnative example:
julia> using CuArrays, CUDAnative
julia> A = CuArrays.zeros(2,2)
2×2 CuArray{Float32,2}:
0.0 0.0
0.0 0.0
julia> B = CuArrays.ones(2,2)
2×2 CuArray{Float32,2}:
1.0 1.0
1.0 1.0
julia> function f(X, Y)
ix = (blockIdx().x-1) * blockDim().x + threadIdx().x
iy = (blockIdx().y-1) * blockDim().y + threadIdx().y
X[ix,iy] = 2*Y[ix,iy]
return
end
f (generic function with 1 method)
julia> @cuda threads=(2,2) f(A, B)
julia> A
2×2 CuArray{Float32,2}:
2.0 2.0
2.0 2.0
You can observe that arrays of type CuArray{Float32,2} were passed to function f. Now, to make sure that one does not accidentally pass arrays of a different types, e.g., one CuArray{Float32,2} and one CuArray{Float64,2} to f, which would lead to a type conversion inside the kernel, one could want to fix the argument type to CuArray{Float32,2}. So, naturally, one would add this to the function signature as in function g:
function g(X::CuArray{Float32,2}, Y::CuArray{Float32,2})
ix = (blockIdx().x-1) * blockDim().x + threadIdx().x
iy = (blockIdx().y-1) * blockDim().y + threadIdx().y
X[ix,iy] = 2*Y[ix,iy]
return
end
This leads however to the following error when the function g is called:
julia> @cuda threads=(2,2) g(A, B)
ERROR: MethodError: no method matching g(::Type{CuDeviceArray{Float32,2,CUDAnative.AS.Global}}, ::Type{CuDeviceArray{Float32,2,CUDAnative.AS.Global}})
Stacktrace:
[1] method_age(::Function, ::Tuple{DataType,DataType}) at /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/packages/CUDAnative/Lr0yj/src/execution.jl:76
[2] macro expansion at /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/packages/CUDAnative/Lr0yj/src/execution.jl:372 [inlined]
[3] #cufunction#176(::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(cufunction), ::typeof(g), ::Type{Tuple{CuDeviceArray{Float32,2,CUDAnative.AS.Global},CuDeviceArray{Float32,2,CUDAnative.AS.Global}}}) at /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/packages/CUDAnative/Lr0yj/src/execution.jl:357
[4] cufunction(::Function, ::Type) at /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/packages/CUDAnative/Lr0yj/src/execution.jl:357
[5] top-level scope at /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/packages/CUDAnative/Lr0yj/src/execution.jl:174
[6] top-level scope at gcutils.jl:87
[7] top-level scope at /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/packages/CUDAnative/Lr0yj/src/execution.jl:171
In fact, the cuda kernel requires arguments of type CuDeviceArray rather than of type CuArray to work properly:
julia> function g(X::CuDeviceArray{Float32,2}, Y::CuDeviceArray{Float32,2})
ix = (blockIdx().x-1) * blockDim().x + threadIdx().x
iy = (blockIdx().y-1) * blockDim().y + threadIdx().y
X[ix,iy] = 3*Y[ix,iy]
return
end
g (generic function with 2 methods)
julia> @cuda threads=(2,2) g(A, B)
julia> A
2×2 CuArray{Float32,2}:
3.0 3.0
3.0 3.0
Question
Would it be possible that, in future, the end user of CUDAnative could specify CuArray for his arguments instead of CuDeviceArray, i.e. the user would specify the arguments of the interface to his code rather than the arguments of the actual kernel that is run on the device?
Thanks!!