Adapt BroadcastStyle for CUDA

RainerHeintzmann · March 12, 2025, 10:59am

I am trying to get CUDA.jl to work seamlessly with a MutableShiftedArray .
By writing a CUDASupportExt.jl class:

module CUDASupportExt
using CUDA 
using Adapt
using MutableShiftedArrays
indexing CUDA error

# lets do this for the MutableShiftedArray type
Adapt.adapt_structure(to, x::MutableShiftedArray) = MutableShiftedArray(adapt(to, parent(x)), shifts(x), size(x); default=MutableShiftedArrays.default(x));

function Base.Broadcast.BroadcastStyle(::Type{T})  where {CT, N, CD, T<:MutableShiftedArray{<:Any,<:Any,<:Any,<:CuArray{CT,N,CD}}}
    CUDA.CuArrayStyle{N,CD}()
end

# Define the BroadcastStyle for SubArray of MutableShiftedArray with CuArray
function Base.Broadcast.BroadcastStyle(::Type{T})  where {CT, N, CD, T<:SubArray{<:Any, <:Any, <:MutableShiftedArray{<:Any,<:Any,<:Any,<:CuArray{CT,N,CD}}}}
    CUDA.CuArrayStyle{N,CD}()
end

end

this works in general fine and for a MutableShiftedArray ma wrapping a CuArray you can broadcast: q = ma .+ 1 or even use views: (@view ma[1:4]) .+1, since the broadcasting mechanism somehow compiles the single element access version of getindex correctly.
However, this does not seem to be the case for copy(ma) or subindexing like ma[1:4] without the view. One could start specifying copy and the getindex() function for Union{Int, AbstractRange} to handle this, but this sound wrong.
Is there no other way, such that this is not necessary and the broadcasting system can take care of the individual element access as it does in the other broadcast cases?
In the end, single-element accesses are anyway generated, but one does get the warning that this is not handled by CUDA and therefore slow.

RainerHeintzmann · March 18, 2025, 5:50pm

I ended up needing to implement a number of such function such as:
copy
collect
Array
==
Each of them performs the wanted operations in a broadcasting way instead. I assume that more such functions are missing.

Topic		Replies	Views
Moving ahead with CUDA support GPU images , cuda , paddedviews , juliaimages	2	283	March 17, 2025
Why Julia returns error on myit .+ 1 in the following code? General Usage	8	801	September 17, 2018
BroadcastStyle of a SubArray Internals & Design broadcasting , adapt	0	105	April 18, 2025
Broadcasting a function on GPU GPU	3	774	October 14, 2021
Performance issue with broadcasting of custom array wrapper wrapping a CuArray GPU question , performance , broadcast	5	1103	December 31, 2019

Adapt BroadcastStyle for CUDA

Related topics