Hello all.
In the following code, sum returns ERROR: Scalar indexing is disallowed
.
How should sum
be dispatched or How should I do?
A = OffsetArray(CUDA.rand(10), -4:5);
a = @views A[-1:2];
sum(a)
Hello all.
In the following code, sum returns ERROR: Scalar indexing is disallowed
.
How should sum
be dispatched or How should I do?
A = OffsetArray(CUDA.rand(10), -4:5);
a = @views A[-1:2];
sum(a)
A simple implementation is shown. But I think there must be a better implementation.
Base.sum(A::OffsetArray{T,N,CuArray{T,N,M}}) where {T,N,M} = sum(parent(A))
function Base.sum(A::SubArray{T,N,OffsetArray{T,N,CuArray{T,N,M}}}) where {T,N,M}
indices = A.indices
offsets = parent(A).offsets
sum(@view parent(parent(A))[CartesianIndices(ntuple(n -> indices[n] .- offsets[n], N))])
end
I think OffsetArray is not quite compatible with CUDA naively, even broadcasting fails:
julia> A = OffsetArray(CUDA.rand(10), -4:5);
julia> sum(A)
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.
If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] errorscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:155
[3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:128
[4] assertscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:116
[5] getindex
@ ~/.julia/packages/GPUArrays/bbZD0/src/host/indexing.jl:50 [inlined]
[6] getindex
@ ~/.julia/packages/OffsetArrays/hwmnB/src/OffsetArrays.jl:438 [inlined]
[7] _mapreduce(f::typeof(identity), op::typeof(Base.add_sum), ::IndexLinear, A::OffsetVector{Float32, CuArray{…}})
@ Base ./reduce.jl:438
[8] _mapreduce_dim
@ ./reducedim.jl:365 [inlined]
[9] mapreduce
@ ./reducedim.jl:357 [inlined]
[10] _sum
@ ./reducedim.jl:1015 [inlined]
[11] _sum
@ ./reducedim.jl:1014 [inlined]
[12] sum(a::OffsetVector{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}})
@ Base ./reducedim.jl:1010
[13] top-level scope
@ REPL[15]:1
[14] top-level scope
@ ~/.julia/packages/CUDA/htRwP/src/initialization.jl:206
Some type information was truncated. Use `show(err)` to see complete types.
julia> A .+ A
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.
If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] errorscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:155
[3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:128
[4] assertscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:116
[5] getindex
@ ~/.julia/packages/GPUArrays/bbZD0/src/host/indexing.jl:50 [inlined]
[6] getindex
@ ~/.julia/packages/OffsetArrays/hwmnB/src/OffsetArrays.jl:438 [inlined]
[7] _broadcast_getindex
@ ./broadcast.jl:675 [inlined]
[8] _getindex
@ ./broadcast.jl:705 [inlined]
[9] _broadcast_getindex
@ ./broadcast.jl:681 [inlined]
[10] getindex
@ ./broadcast.jl:636 [inlined]
[11] macro expansion
@ ./broadcast.jl:1004 [inlined]
[12] macro expansion
@ ./simdloop.jl:77 [inlined]
[13] copyto!
@ ./broadcast.jl:1003 [inlined]
[14] copyto!
@ ./broadcast.jl:956 [inlined]
[15] copy
@ ./broadcast.jl:928 [inlined]
[16] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{…}, Nothing, typeof(+), Tuple{…}})
@ Base.Broadcast ./broadcast.jl:903
[17] top-level scope
@ REPL[16]:1
[18] top-level scope
@ ~/.julia/packages/CUDA/htRwP/src/initialization.jl:206
Some type information was truncated. Use `show(err)` to see complete types.
julia> A
10-element OffsetArray(::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, -4:5) with eltype Float32 with indices -4:5:
0.19389287
0.71536046
0.8175033
0.57097757
0.058391646
0.72871023
0.6042697
0.88648033
0.76349247
0.9775343
There is some hints here if it might help:
Yeah, Julia isn’t currently great wrt. wrapped arrays and preserving functionality from the contained array type where needed. I typically link to Use with multiple wrappers · Issue #21 · JuliaGPU/Adapt.jl · GitHub for this, and this would need some work in Base to resolve (e.g., AbstractWrappedArray, or another approach for wrapped array identification · Issue #51910 · JuliaLang/julia · GitHub). We try to support Base’s array wrappers as much as possible, and for other types like OffsetArray a package extension that fixes or overrides dispatch where needed could be added.
If you simply want compatibility (i.e., without triggering scalar indexing errors, but also without executing on the GPU) you can use unified memory, see CUDA.jl 5.4: Memory management mayhem ⋅ JuliaGPU
Thank you both. So, as it stands, we need to define this roundabout wrapper for ourselves.
That is not the case. Can you share what you are running into?
I tried
A = CUDA.zeros(10)
A = cu(A, unified=true)
Maybe CuArray{Float64,1,CUDA.UnifiedMemory} works? but I cannot try yet.
Additionally it seems that unified memory is allocated on cpu memory. Is this cause any performance issues?
The cu
function is to be used with CPU inputs; It’s a user-friendly constructor.
Thank you. I understand.