Scalar indexing GPU problem in Flux.jl model

I am trying to use the GPU to accelerate training on a CNN. It’s my first time doing this. I keep getting an exception saying that “Scalar indexing is disallowed”, and I can’t understand why after reading what I could find about it. Here’s an example:

using CUDA
using Flux
using Random

N_data = 32
img_dims = (16, 16, 3)
images = Float32.(randn((img_dims..., N_data))) |> gpu

model = Chain(
    Conv((3,3), 3 => 1, relu),      # 16×16×3 -> 14×14×1
    Flux.flatten,                   # 14×14×1 -> 196
    Dense(196 => 2)                 # 196 -> 2
)  |> gpu

model(images) # error: TaskFailedException: "Scalar indexing is disallowed."

I understand that scalar indexing doesn’t work with the GPU efficiently, but what about the toy model above causes scalar indexing? How can I prevent it?

This ought to work, and does for me. Do you have fairly recent versions of Flux and CUDA.jl?

julia> CUDA.allowscalar(false)

julia> images .+ 10 |> summary    # simple operation to check CUDA alone
"16×16×3×32 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}"

julia> model(images)
2×32 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
  0.468265  0.459961  -0.771641  -0.771171  …   0.72922    0.310413  -1.15194  -0.753809
 -0.993106  1.10223   -0.033322  -1.39801      -0.524264  -0.561808  -0.62486  -0.0378232

(@v1.10) pkg> st Flux CUDA
Status `~/.julia/environments/v1.10/Project.toml`
  [052768ef] CUDA v5.2.0
  [587475ba] Flux v0.14.14  # latest is in fact 0.14.15

Yeah, I think I’ve got the most recent versions.

(@v1.10) pkg> st Flux CUDA
Status `C:\Users\smaggi\.julia\environments\v1.10\Project.toml`
  [052768ef] CUDA v5.3.3
  [587475ba] Flux v0.14.15

Here is the complete error message from running the code in the original query, if it’s any help:

Error: TaskFailedException

    nested task error: TaskFailedException

        nested task error: Scalar indexing is disallowed.
        Invocation of getindex resulted in scalar indexing of a GPU array.
        This is typically caused by calling an iterating implementation of a method.
        Such implementations *do not* execute on the GPU, but very slowly on the CPU,
        and therefore should be avoided.

        If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
        to enable scalar iteration globally or for the operations in question.
        Stacktrace:
          [1] error(s::String)
            @ Base .\error.jl:35
          [2] errorscalar(op::String)
            @ GPUArraysCore C:\Users\smaggi\.julia\packages\GPUArraysCore\GMsgk\src\GPUArraysCore.jl:155
          [3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
            @ GPUArraysCore C:\Users\smaggi\.julia\packages\GPUArraysCore\GMsgk\src\GPUArraysCore.jl:128
          [4] assertscalar(op::String)
            @ GPUArraysCore C:\Users\smaggi\.julia\packages\GPUArraysCore\GMsgk\src\GPUArraysCore.jl:116
          [5] getindex(A::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, I::Int64)
            @ GPUArrays C:\Users\smaggi\.julia\packages\GPUArrays\OKkAu\src\host\indexing.jl:48
          [6] scalar_getindex(::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, ::Int64, ::Vararg{Int64})
            @ GPUArrays C:\Users\smaggi\.julia\packages\GPUArrays\OKkAu\src\host\indexing.jl:34
          [7] _getindex
            @ C:\Users\smaggi\.julia\packages\GPUArrays\OKkAu\src\host\indexing.jl:17 [inlined]
          [8] getindex
            @ C:\Users\smaggi\.julia\packages\GPUArrays\OKkAu\src\host\indexing.jl:15 [inlined]
          [9] getindex
            @ .\subarray.jl:290 [inlined]
         [10] im2col!(col::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, x::SubArray{Float32, 4, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Int64}, true}, cdims::DenseConvDims{3, 3, 3, 6, 3})
            @ NNlib C:\Users\smaggi\.julia\packages\NNlib\c3RdJ\src\impl\conv_im2col.jl:238
         [11] (::NNlib.var"#648#649"{CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, Float32, Float32, SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, DenseConvDims{3, 3, 3, 6, 3}, Int64, Int64, Int64, UnitRange{Int64}, Int64})()
            @ NNlib C:\Users\smaggi\.julia\packages\NNlib\c3RdJ\src\impl\conv_im2col.jl:54
    Stacktrace:
     [1] sync_end(c::Channel{Any})
       @ Base .\task.jl:448
     [2] macro expansion
       @ .\task.jl:480 [inlined]
     [3] conv_im2col!(y::SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, x::SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, w::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3}; col::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, alpha::Float32, beta::Float32, ntasks::Int64)
       @ NNlib C:\Users\smaggi\.julia\packages\NNlib\c3RdJ\src\impl\conv_im2col.jl:50
     [4] conv_im2col!(y::SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, x::SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, w::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3})
       @ NNlib C:\Users\smaggi\.julia\packages\NNlib\c3RdJ\src\impl\conv_im2col.jl:23
     [5] (::NNlib.var"#306#310"{@Kwargs{}, DenseConvDims{3, 3, 3, 6, 3}, SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}})()
       @ NNlib C:\Users\smaggi\.julia\packages\NNlib\c3RdJ\src\conv.jl:209
Stacktrace:
  [1] sync_end(c::Channel{Any})
    @ Base .\task.jl:448
  [2] macro expansion
    @ .\task.jl:480 [inlined]
  [3] conv!(out::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, in1::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, in2::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3}; kwargs::@Kwargs{})
    @ NNlib C:\Users\smaggi\.julia\packages\NNlib\c3RdJ\src\conv.jl:205
  [4] conv!
    @ C:\Users\smaggi\.julia\packages\NNlib\c3RdJ\src\conv.jl:185 [inlined]
  [5] #conv!#265
    @ C:\Users\smaggi\.julia\packages\NNlib\c3RdJ\src\conv.jl:145 [inlined]
  [6] conv!
    @ C:\Users\smaggi\.julia\packages\NNlib\c3RdJ\src\conv.jl:140 [inlined]
  [7] conv(x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, w::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{2, 2, 2, 4, 2}; kwargs::@Kwargs{})
    @ NNlib C:\Users\smaggi\.julia\packages\NNlib\c3RdJ\src\conv.jl:88
  [8] conv
    @ C:\Users\smaggi\.julia\packages\NNlib\c3RdJ\src\conv.jl:83 [inlined]
  [9] (::Conv{2, 4, typeof(relu), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}})(x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
    @ Flux C:\Users\smaggi\.julia\packages\Flux\Wz6D4\src\layers\conv.jl:202
 [10] macro expansion
    @ C:\Users\smaggi\.julia\packages\Flux\Wz6D4\src\layers\basic.jl:53 [inlined]
 [11] _applychain(layers::Tuple{Conv{2, 4, typeof(relu), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, typeof(Flux.flatten), Dense{typeof(identity), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
    @ Flux C:\Users\smaggi\.julia\packages\Flux\Wz6D4\src\layers\basic.jl:53
 [12] (::Chain{Tuple{Conv{2, 4, typeof(relu), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, typeof(Flux.flatten), Dense{typeof(identity), CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, 
CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}})(x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
    @ Flux C:\Users\smaggi\.julia\packages\Flux\Wz6D4\src\layers\basic.jl:51
 [13] top-level scope
    @ c:\Users\smaggi\Downloads\test2.jl:15
 [14] eval
    @ .\boot.jl:385 [inlined]
 [15] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base .\loading.jl:2076
 [16] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::@Kwargs{})
    @ Base .\essentials.jl:892
 [17] invokelatest(::Any, ::Any, ::Vararg{Any})
    @ Base .\essentials.jl:889
 [18] inlineeval(m::Module, code::String, code_line::Int64, code_column::Int64, file::String; softscope::Bool)
    @ VSCodeServer c:\Users\smaggi\.vscode\extensions\julialang.language-julia-1.76.2\scripts\packages\VSCodeServer\src\eval.jl:263
 [19] (::VSCodeServer.var"#69#74"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer c:\Users\smaggi\.vscode\extensions\julialang.language-julia-1.76.2\scripts\packages\VSCodeServer\src\eval.jl:181
 [20] withpath(f::VSCodeServer.var"#69#74"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams}, path::String)
    @ VSCodeServer c:\Users\smaggi\.vscode\extensions\julialang.language-julia-1.76.2\scripts\packages\VSCodeServer\src\repl.jl:274
 [21] (::VSCodeServer.var"#68#73"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer c:\Users\smaggi\.vscode\extensions\julialang.language-julia-1.76.2\scripts\packages\VSCodeServer\src\eval.jl:179
 [22] hideprompt(f::VSCodeServer.var"#68#73"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})
    @ VSCodeServer c:\Users\smaggi\.vscode\extensions\julialang.language-julia-1.76.2\scripts\packages\VSCodeServer\src\repl.jl:38
 [23] (::VSCodeServer.var"#67#72"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer c:\Users\smaggi\.vscode\extensions\julialang.language-julia-1.76.2\scripts\packages\VSCodeServer\src\eval.jl:150
 [24] with_logstate(f::Function, logstate::Any)
    @ Base.CoreLogging .\logging.jl:515
 [25] with_logger
    @ .\logging.jl:627 [inlined]
 [26] (::VSCodeServer.var"#66#71"{VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer c:\Users\smaggi\.vscode\extensions\julialang.language-julia-1.76.2\scripts\packages\VSCodeServer\src\eval.jl:255
 [27] #invokelatest#2
    @ .\essentials.jl:892 [inlined]
 [28] invokelatest(::Any)
    @ Base .\essentials.jl:889
 [29] (::VSCodeServer.var"#64#65")()
    @ VSCodeServer c:\Users\smaggi\.vscode\extensions\julialang.language-julia-1.76.2\scripts\packages\VSCodeServer\src\eval.jl:34
in expression starting at c:\Users\smaggi\Downloads\test2.jl:15

using cuDNN is missing

2 Likes

Thank you! That fixed it