ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.

in

function scale!(A::SparseArrays.SparseMatrixCSC{Float64,Int}, b::Vector{Float64})
if CUDA.functional()
A_gpu = CuSparseMatrixCSC(A)
row_norms_gpu = sqrt.(sum(abs2, A_gpu, dims=2))
scaling_factors_gpu = CUDA.ones(size(row_norms_gpu)) ./ row_norms_gpu
scaling_factors = Array(scaling_factors_gpu)
else
scaling_factors = 1.0 ./ sqrt.(sum(abs2, A, dims=2))
end
A .*= scaling_factors
b .*= scaling_factors
end

where row_norms_gpu is a normal (non-gpu) array creating some bad interactions. So either changing CUDA.ones(...) to a 1 or wrapping the row_norms_gpu in a CuArray seems to remove the error, but the first case does the calculation on cpu and the second does a dense sum, so neither might be very good if you operate on a very large matrix.

As to why row_norms_gpu is not already a CuArray it seems like the sum for CuSparseMatrixCSC results in a normal Array, while sum(CuArray(A_gpu), dims=2) results in a CuArray. Not sure what I would expect here (CuSparse or CuArray), but probably something that was still on the GPU at least.