Dealing with views and cuda array wrappers

smartalecH · March 28, 2023, 7:53pm

I’m working with an array wrapper that uses a CuArray under the hood. When I try to @view into it, I get a complicated SubArray that calls the Base.fill!() routine rather than the Cuda.fill!() routine (i.e. it’s scalar indexing). Here’s the type:

my_array::SubArray{Float32, 2, MyWrapper{Float32, 2, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, 2}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}, x::Int64)

(the view is not contiguous)

I’m seeing that the compiler sometimes has trouble dispatching if there are “too many” wrapper types.

What’s the canonical way to get around this (other than to abandon my wrapper)? I’ve got Adapt.jl set up with my wrapper (i.e. I overloaded adapt_structure), and it seems to work fine. Are there additional methods (e.g. from Base) that I could overload to help the compiler out?

Thanks!

torrance · March 29, 2023, 2:58am

Unfortunately, unless the view is contiguous you won’t be able to easily avoid scalar indexing.

Ideally, you would want to be operating on contiguous arrays on the GPU: it’s simpler and will take much more advantage of the power of the GPU. Can you rewrite your algorithm to perhaps materialise the view on the GPU as a contiguous array?

One hacky alternative: if the view is defined by a bitmask, you could send both the original array and the bit mask to the GPU and perform operations over both, e.g. something like (not tested)

values = CUDA.rand(1000)
bitmask = CUDA.rand(Bool, 1000)

# Perform some op only on valid entries
map!(values, values, bitmask) do (val, flag)
    return flag ? f(val) : val
end

smartalecH · March 29, 2023, 4:52pm

Thank you, @torrance!

Can you rewrite your algorithm to perhaps materialise the view on the GPU as a contiguous array?

Good question. Ultimately I’d like to be doing some halo exchanges. These halos, as you probably know, consist of “boundary elements” surrounding the array. So they aren’t contiguous, but they are strided. Perhaps there’s a clever way I can leverage that structure?

if the view is defined by a bitmask, you could send both the original array and the bit mask to the GPU and perform operations over bot

Interesting idea! Thank you!

Topic		Replies	Views
Using @view with CuArrays GPU	6	1155	September 20, 2023
MD with Subarrays and CuArrays GPU	5	905	October 14, 2020
Error of view on CuArrays with discrete indices GPU	0	776	January 17, 2019
Continuous view of CuArray using CartesianIndices does not return native CuArray GPU indexing	2	353	May 4, 2021
Writing stencils for CuArray GPU	6	1144	July 31, 2019

Dealing with views and cuda array wrappers

Related topics