CUDAnative: would it be possible to specify interface arguments rather than kernel arguments?

Issue

Let’s consider the following simple CUDAnative example:

julia> using CuArrays, CUDAnative

julia> A = CuArrays.zeros(2,2)
2×2 CuArray{Float32,2}:
 0.0  0.0
 0.0  0.0

julia> B = CuArrays.ones(2,2)
2×2 CuArray{Float32,2}:
 1.0  1.0
 1.0  1.0

julia> function f(X, Y)
           ix = (blockIdx().x-1) * blockDim().x + threadIdx().x
           iy = (blockIdx().y-1) * blockDim().y + threadIdx().y
           X[ix,iy] = 2*Y[ix,iy]
           return
       end
f (generic function with 1 method)

julia> @cuda threads=(2,2) f(A, B)

julia> A
2×2 CuArray{Float32,2}:
 2.0  2.0
 2.0  2.0

You can observe that arrays of type CuArray{Float32,2} were passed to function f. Now, to make sure that one does not accidentally pass arrays of a different types, e.g., one CuArray{Float32,2} and one CuArray{Float64,2} to f, which would lead to a type conversion inside the kernel, one could want to fix the argument type to CuArray{Float32,2}. So, naturally, one would add this to the function signature as in function g:

function g(X::CuArray{Float32,2}, Y::CuArray{Float32,2})
    ix = (blockIdx().x-1) * blockDim().x + threadIdx().x
    iy = (blockIdx().y-1) * blockDim().y + threadIdx().y
    X[ix,iy] = 2*Y[ix,iy]
    return
end

This leads however to the following error when the function g is called:

julia> @cuda threads=(2,2) g(A, B)
ERROR: MethodError: no method matching g(::Type{CuDeviceArray{Float32,2,CUDAnative.AS.Global}}, ::Type{CuDeviceArray{Float32,2,CUDAnative.AS.Global}})
Stacktrace:
 [1] method_age(::Function, ::Tuple{DataType,DataType}) at /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/packages/CUDAnative/Lr0yj/src/execution.jl:76
 [2] macro expansion at /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/packages/CUDAnative/Lr0yj/src/execution.jl:372 [inlined]
 [3] #cufunction#176(::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(cufunction), ::typeof(g), ::Type{Tuple{CuDeviceArray{Float32,2,CUDAnative.AS.Global},CuDeviceArray{Float32,2,CUDAnative.AS.Global}}}) at /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/packages/CUDAnative/Lr0yj/src/execution.jl:357
 [4] cufunction(::Function, ::Type) at /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/packages/CUDAnative/Lr0yj/src/execution.jl:357
 [5] top-level scope at /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/packages/CUDAnative/Lr0yj/src/execution.jl:174
 [6] top-level scope at gcutils.jl:87
 [7] top-level scope at /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/packages/CUDAnative/Lr0yj/src/execution.jl:171

In fact, the cuda kernel requires arguments of type CuDeviceArray rather than of type CuArray to work properly:

julia> function g(X::CuDeviceArray{Float32,2}, Y::CuDeviceArray{Float32,2})
           ix = (blockIdx().x-1) * blockDim().x + threadIdx().x
           iy = (blockIdx().y-1) * blockDim().y + threadIdx().y
           X[ix,iy] = 3*Y[ix,iy]
           return
       end
g (generic function with 2 methods)

julia> @cuda threads=(2,2) g(A, B)

julia> A
2×2 CuArray{Float32,2}:
 3.0  3.0
 3.0  3.0

Question

Would it be possible that, in future, the end user of CUDAnative could specify CuArray for his arguments instead of CuDeviceArray, i.e. the user would specify the arguments of the interface to his code rather than the arguments of the actual kernel that is run on the device?

Thanks!!

No, because kernels are regular functions that abide by Julia’s rules. There’s nothing magical happening here. Your best bet would be for this to be possible if CuArray is actually usable on device, i.e. remove the need for CuDeviceArray altogether, but I don’t see that happening anytime soon (we’d need really powerful contextual dispatch for that).

Alternatively, you could write a macro to prefix kernel definitions with and rewrite type signatures similarly to how values are converted at the CPU-GPU boundary (CuArray->CuDeviceArray, for example), but I don’t think there’s much interest in that. Functions don’t typically need to be tightly typed like that.

1 Like

OK, thanks for the reply!