Scalar Indexing Error: multiplying matrix by scalar

Hey all,

I’ve been trying to use Flux on my GPU and am running into a scalar indexing error when trying to multiply a matrix by a scalar. I found a workaround by creating a function that multiplies two items and then broadcasting the function

mul_scalar(x,a)=x*a
# using this in flux works as expected
x_out=mul_scalar.(ones(1,10),0.25)

However, if I try either of the following approaches, I receive a scalar indexing error

x_out=ones(1,10)*0.25
x_out=ones(1,10).*0.25

I believe the error occurs when trying to update the parameters of the model. The forward pass of the model seems to work using all three approaches. Should this behavior be expected? I didn’t run into any issues raising the elements of a matrix to a power so I thought that multiplication by a scalar would work as well. Any information is greatly appreciated!

Both the working code and not working code are presented below. The difference between the two codes is in the function test_loss(model,x,z,k)

Working code

using CUDA
using Flux
using Statistics
using StatsBase
using GPUArrays
if has_cuda()		# Check if CUDA is available
    @info "CUDA is on"
end
CUDA.allowscalar(false)

function mul_scalar(x,a)
    return x*a
end

function setmodel()
    layer1=Dense(2,16,leakyrelu)
    layer2=Dense(16,16,leakyrelu)
    layer3=Dense(16,1)
    return Chain(layer1,layer2,layer3)
end 

function test_loss(model,x,z,k)
    target=mul_scalar.(z,k)
    #target=z.*k
    y=model(x)
    return (y-target).^2    
end

function test_train()
    N=1000
    x=rand(2,N)
    z=rand(1,N)
    model=f64(setmodel())
    k=0.25
    x=x|>gpu
    z=z|>gpu
    k=k|>gpu
    model=model|>gpu
    L(x1,z1)=mean(test_loss(model,x1,z1,k))

    ps=params(model)

    opt = ADAM(0.001)

    dif=L(x,z)
    println(dif)
    data_agg=zip([x],[z])
    Flux.Optimise.train!(L,ps,data_agg,opt)

    return model
end

model=test_train()

Not Working Code

using CUDA
using Flux
using Statistics
using StatsBase
using GPUArrays
if has_cuda()		# Check if CUDA is available
    @info "CUDA is on"
end
CUDA.allowscalar(false)

function mul_scalar(x,a)
    return x*a
end

function setmodel()
    layer1=Dense(2,16,leakyrelu)
    layer2=Dense(16,16,leakyrelu)
    layer3=Dense(16,1)
    return Chain(layer1,layer2,layer3)
end 

function test_loss(model,x,z,k)
    #target=mul_scalar.(z,k)
    target=z*k
    y=model(x)
    return (y-target).^2    
end

function test_train()
    N=1000
    x=rand(2,N)
    z=rand(1,N)
    model=f64(setmodel())
    k=0.25
    x=x|>gpu
    z=z|>gpu
    k=k|>gpu
    model=model|>gpu
    L(x1,z1)=mean(test_loss(model,x1,z1,k))

    ps=params(model)

    opt = ADAM(0.001)

    dif=L(x,z)
    println(dif)
    data_agg=zip([x],[z])
    Flux.Optimise.train!(L,ps,data_agg,opt)

    return model
end

model=test_train()

Error Message

ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] assertscalar(op::String)
    @ GPUArrays ~/.julia/packages/GPUArrays/3sW6s/src/host/indexing.jl:53
  [3] getindex(xs::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, I::Int64)
    @ GPUArrays ~/.julia/packages/GPUArrays/3sW6s/src/host/indexing.jl:86
  [4] first
    @ ./abstractarray.jl:368 [inlined]
  [5] dot(x::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, y::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
    @ LinearAlgebra /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:914
  [6] #1286
    @ ~/.julia/packages/ChainRules/RyXef/src/rulesets/Base/arraymath.jl:125 [inlined]
  [7] unthunk
    @ ~/.julia/packages/ChainRulesCore/Y1Mee/src/tangent_types/thunks.jl:194 [inlined]
  [8] wrap_chainrules_output
    @ ~/.julia/packages/Zygote/rv6db/src/compiler/chainrules.jl:104 [inlined]
  [9] map
    @ ./tuple.jl:215 [inlined]
 [10] wrap_chainrules_output
    @ ~/.julia/packages/Zygote/rv6db/src/compiler/chainrules.jl:105 [inlined]
 [11] ZBack
    @ ~/.julia/packages/Zygote/rv6db/src/compiler/chainrules.jl:179 [inlined]
 [12] #3752#back
    @ ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:67 [inlined]
 [13] Pullback
    @ ~/Documents/julia_projects/flux_vfi/gpu_issue.jl:25 [inlined]
 [14] (::typeof(∂(test_loss)))(Δ::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/compiler/interface2.jl:0
 [15] Pullback
    @ ~/Documents/julia_projects/flux_vfi/gpu_issue.jl:40 [inlined]
 [16] (::typeof(∂(λ)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/compiler/interface2.jl:0
 [17] #203
    @ ~/.julia/packages/Zygote/rv6db/src/lib/lib.jl:203 [inlined]
 [18] #1733#back
    @ ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:67 [inlined]
 [19] Pullback
    @ ~/.julia/packages/Flux/ZnXxS/src/optimise/train.jl:105 [inlined]
 [20] (::typeof(∂(λ)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/compiler/interface2.jl:0
 [21] (::Zygote.var"#84#85"{Zygote.Params, typeof(∂(λ)), Zygote.Context})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/compiler/interface.jl:343
 [22] gradient(f::Function, args::Zygote.Params)
    @ Zygote ~/.julia/packages/Zygote/rv6db/src/compiler/interface.jl:76
 [23] macro expansion
    @ ~/.julia/packages/Flux/ZnXxS/src/optimise/train.jl:104 [inlined]
 [24] macro expansion
    @ ~/.julia/packages/Juno/n6wyj/src/progress.jl:134 [inlined]
 [25] train!(loss::Function, ps::Zygote.Params, data::Base.Iterators.Zip{Tuple{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, opt::ADAM; cb::Flux.Optimise.var"#40#46")
    @ Flux.Optimise ~/.julia/packages/Flux/ZnXxS/src/optimise/train.jl:102
 [26] train!(loss::Function, ps::Zygote.Params, data::Base.Iterators.Zip{Tuple{Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, Vector{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}}}, opt::ADAM)
    @ Flux.Optimise ~/.julia/packages/Flux/ZnXxS/src/optimise/train.jl:100
 [27] test_train()
    @ Main ~/Documents/julia_projects/flux_vfi/gpu_issue.jl:49
 [28] top-level scope
    @ ~/Documents/julia_projects/flux_vfi/gpu_issue.jl:55
in expression starting at /home/chris/Documents/julia_projects/flux_vfi/gpu_issue.jl:55

I think the problem here is that your scalar k=0.25 is Float64, while the arrays are Float32. This promotes the array (which you probably don’t want, for performance reasons), and that leads to mixed element types in the gradient, which is what’s giving an error:

julia> using CUDA, LinearAlgebra

julia> CUDA.allowscalar(false)

julia> dot(CuArray([1f0, 2f0]), CuArray([3f0, 4f0]))
11.0f0

julia> dot(CuArray([1f0, 2f0]), CuArray([3.0, 4.0]))
ERROR: Scalar indexing is disallowed.
...
Stacktrace:
...
 [5] dot(x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, y::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer})

julia> using Zygote

julia> gradient(x -> sum(x * 3f0), CuArray([1f0, 2f0]))[1]
2-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 3.0
 3.0

julia> gradient(x -> sum(x * 3.0), CuArray([1f0, 2f0]))[1]
ERROR: Scalar indexing is disallowed.
...
  [5] dot(x::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, y::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})

Yes, you are correct. Thank you for your help!