I am trying to get CUDA.jl to work seamlessly with a MutableShiftedArray
.
By writing a CUDASupportExt.jl class:
module CUDASupportExt
using CUDA
using Adapt
using MutableShiftedArrays
indexing CUDA error
# lets do this for the MutableShiftedArray type
Adapt.adapt_structure(to, x::MutableShiftedArray) = MutableShiftedArray(adapt(to, parent(x)), shifts(x), size(x); default=MutableShiftedArrays.default(x));
function Base.Broadcast.BroadcastStyle(::Type{T}) where {CT, N, CD, T<:MutableShiftedArray{<:Any,<:Any,<:Any,<:CuArray{CT,N,CD}}}
CUDA.CuArrayStyle{N,CD}()
end
# Define the BroadcastStyle for SubArray of MutableShiftedArray with CuArray
function Base.Broadcast.BroadcastStyle(::Type{T}) where {CT, N, CD, T<:SubArray{<:Any, <:Any, <:MutableShiftedArray{<:Any,<:Any,<:Any,<:CuArray{CT,N,CD}}}}
CUDA.CuArrayStyle{N,CD}()
end
end
this works in general fine and for a MutableShiftedArray
ma
wrapping a CuArray
you can broadcast: q = ma .+ 1
or even use views: (@view ma[1:4]) .+1
, since the broadcasting mechanism somehow compiles the single element access version of getindex
correctly.
However, this does not seem to be the case for copy(ma)
or subindexing like ma[1:4]
without the view
. One could start specifying copy
and the getindex()
function for Union{Int, AbstractRange}
to handle this, but this sound wrong.
Is there no other way, such that this is not necessary and the broadcasting system can take care of the individual element access as it does in the other broadcast cases?
In the end, single-element accesses are anyway generated, but one does get the warning that this is not handled by CUDA and therefore slow.