MD with Subarrays and CuArrays

A view of a CuArray has type SubArray. This means

typeof(b) <: CuArray


If I use multiple dispatch to execute a function on the CPU or GPU depending on the type of the input then passing it a view of a CuArray produces an unexpected result. Is there a better design paradigm to deal with this case?

I think you’re using CUDA@v2. (The result is true in previous versions :smile: and I like this change. Kudos to @maleadt )

You may need to change the signature of your function here. Previously you may have a function like this:

function f(x::AbstractArray)
    # do sth with on CPU by default

function f(x::CuArray)
    # do sth on GPU

# change the later into the following
function f(x::Union{CuArray, SubArray{<:Any,<:Any,<:CuArray})
    # do sth on GPU

However, the view of a view of CuArray will not work here. And the same goes for ReshapedArray. So you may need to create something like this finally:

f(x::AbstractArray) = f(device_of(x), x)

function f(::OnGPU, x) #= do sth on GPU =# end
function f(::OnCPU, x) #= do sth on CPU =# end

device_of(x::AbstractArray) = OnCPU()
device_of(x::CuArray) = OnGPU()
device_of(x::Union{SubArray, ReshapedArray}) = device_of(parent(x))
# add some more dispatches if needed.

A useful pattern for functions like device_of is to recursively un-wrap:

julia> function storage_type(A::AbstractArray)
           P = parent(A)
           typeof(A) === typeof(P) ? typeof(A) : storage_type(P)

julia> transpose(reshape(view(ones(2,4),1,:),2,2)) |> storage_type

julia> reshape(1:6,2,3)' |> storage_type

Then dispatch something like this, with a catch-all CPU method:

f(x::AbstractArray) = _f(storage_type(x), x)
function _f(::Type{T}, x) where {T<:CuArray} #= do sth on GPU =# end
function _f(::Type, x) #= do sth on CPU =# end

I changed view/reshape/reinterpret to reuse the Base wrappers with appropriate type unions to catch cases where we can get a contiguous or strided pointer (DenseCuArray, StridedCuArray, or AnyCuArray for anything that contains a CuArray). Although this works beautifully, as it turns out the precompilation and load-time cost of these type unions is pretty hefty… so I might have to revisit/revert this in the future :frowning_face:

1 Like

Other than the compilation cost, how does this make sure that things are dispatched to CuArray? I understand that forcing all wrappers of CuArray to also be a CuArray is not the way to go, but double wrappers always end up confusing me.

This is something that I would really like to understand, because I’ve encountered similar issues a few times with StructArrays.

For example, if I do

julia> c = CUDA.rand(10, 10);

julia> v = view(c, :, 1:3);

julia> v .+ v;

can julia manage to get an optimized version of broadcast here? (Sorry, unfortunately I’m working on a machine without CUDA support at the moment and can’t verify.) If that’s the case, is there a simple intuition as to how that works?

Yes, the unions are here:

1 Like