MD with Subarrays and CuArrays

A view of a CuArray has type SubArray. This means

a=CUDA.zeros(3,2)
b=view(a,:,1)
typeof(b) <: CuArray

false

If I use multiple dispatch to execute a function on the CPU or GPU depending on the type of the input then passing it a view of a CuArray produces an unexpected result. Is there a better design paradigm to deal with this case?

I think you’re using CUDA@v2. (The result is true in previous versions :smile: and I like this change. Kudos to @maleadt )

You may need to change the signature of your function here. Previously you may have a function like this:

function f(x::AbstractArray)
    # do sth with on CPU by default
end

function f(x::CuArray)
    # do sth on GPU
end

# change the later into the following
function f(x::Union{CuArray, SubArray{<:Any,<:Any,<:CuArray})
    # do sth on GPU
end

However, the view of a view of CuArray will not work here. And the same goes for ReshapedArray. So you may need to create something like this finally:

f(x::AbstractArray) = f(device_of(x), x)

function f(::OnGPU, x) #= do sth on GPU =# end
function f(::OnCPU, x) #= do sth on CPU =# end

device_of(x::AbstractArray) = OnCPU()
device_of(x::CuArray) = OnGPU()
device_of(x::Union{SubArray, ReshapedArray}) = device_of(parent(x))
# add some more dispatches if needed.

A useful pattern for functions like device_of is to recursively un-wrap:

julia> function storage_type(A::AbstractArray)
           P = parent(A)
           typeof(A) === typeof(P) ? typeof(A) : storage_type(P)
       end;

julia> transpose(reshape(view(ones(2,4),1,:),2,2)) |> storage_type
Array{Float64,2}

julia> reshape(1:6,2,3)' |> storage_type
UnitRange{Int64}

Then dispatch something like this, with a catch-all CPU method:

f(x::AbstractArray) = _f(storage_type(x), x)
function _f(::Type{T}, x) where {T<:CuArray} #= do sth on GPU =# end
function _f(::Type, x) #= do sth on CPU =# end
2 Likes

I changed view/reshape/reinterpret to reuse the Base wrappers with appropriate type unions to catch cases where we can get a contiguous or strided pointer (DenseCuArray, StridedCuArray, or AnyCuArray for anything that contains a CuArray). Although this works beautifully, as it turns out the precompilation and load-time cost of these type unions is pretty hefty… so I might have to revisit/revert this in the future :frowning_face:

1 Like

Other than the compilation cost, how does this make sure that things are dispatched to CuArray? I understand that forcing all wrappers of CuArray to also be a CuArray is not the way to go, but double wrappers always end up confusing me.

This is something that I would really like to understand, because I’ve encountered similar issues a few times with StructArrays.

For example, if I do

julia> c = CUDA.rand(10, 10);

julia> v = view(c, :, 1:3);

julia> v .+ v;

can julia manage to get an optimized version of broadcast here? (Sorry, unfortunately I’m working on a machine without CUDA support at the moment and can’t verify.) If that’s the case, is there a simple intuition as to how that works?

Yes, the unions are here: https://github.com/JuliaGPU/CUDA.jl/blob/62ae0c3f9e8298d87c1c2d3a3bfd7401005d9092/src/array.jl#L142-L175

1 Like