Rotr90 of a CUDA.CuArray

I’m trying to implement a neural network which rotates the image within the layers of the network. My question is: is there a way to use something like Base.rotr90 on a CUDA.CuArray while disabling scalar indexing (CUDA.allowscalar(false))?

Here’s what I use on the CPU:

a = randn((4, 4, 3, 2))
mapslices(rotr90, a, dims=[1,2])

on the GPU I tried:

using CUDA

CUDA.allowscalar(false)
a = randn((4, 4, 3, 2)) |> Flux.gpu
mapslices(rotr90, a, dims=[1,2])

I get the following error (that scalar indexing is disabled):

ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.

Is there any way around this or should I just use scalar indexing?

you could try a matrix multiplication. it’s way less efficient, but very well might be faster.

It’s just a reverse & a transpose, so you can do something like this:

julia> using JLArrays; JLArrays.allowscalar(false)

julia> b = jl(mapslices(rotr90, a, dims=(1,2)));

julia> b == permutedims(jl(a)[end:-1:begin, :, :, :], (2,1,3,4))
true

julia> b == permutedims(reverse(jl(a); dims=1), (2,1,3,4))
ERROR: Scalar indexing is disallowed.  # unsure about CuArray

# that's 2 copies, lazier variants:

julia> b == @views permutedims(jl(a)[end:-1:begin, :, :, :], (2,1,3,4))
true

julia> b == PermutedDimsArray(jl(a)[end:-1:begin, :, :, :], (2,1,3,4))
true

julia> b == @views PermutedDimsArray(jl(a)[end:-1:begin, :, :, :], (2,1,3,4))  
ERROR: Scalar indexing is disallowed.  # two wrappers generally doesn't work

# something else I tried:

julia> using TensorCast

julia> x = rand(1:99, 2, 3)
2×3 Matrix{Int64}:
 51   2  69
 95  17  48

julia> @cast y[i,j] := x[-j,i]
3×2 transpose(view(::Matrix{Int64}, 2:-1:1, :)) with eltype Int64:
 95  51
 17   2
 48  69

julia> y == rotr90(x)
true

julia> @cast y[i,j] := x[-j,i] lazy=false  # makes a Matrix
3×2 Matrix{Int64}:
 95  51
 17   2
 48  69

julia> @cast y[i,j] := jl(x)[-j,i] lazy=false  # not lazy enough!
ERROR: Scalar indexing is disallowed.
2 Likes

Thank you so much (both) for helping me with this!

First, excuse my ignorance but what is the JLArrays package? I had a short search for it, but nothing on Julia Packages. I tried adding JLArrays to my project and got version v0.1.1 is this the one you were using @mcabbott?

It’s on juliahub (v8.5.0), that’s what I use to look up packages, I’m not sure juliapackages.com is maintained (where did you find it? if/since confirmed no longer maintained, I think I should edit it out of the Julia wikibook).
JuliaHub

1 Like

Ah, I see now. Reading the source of JLArrays after it had been installed on my computer I can see it’s a “reference implementation” of GPUArrays but for CPUs. JLArrays is for use in development, for example testing, smart! Maybe I should start using it with my CI setup. There most be some dev-notes somewhere about using JLArrays, but I haven’t come across them yet. If anyone comes across them and would like to share, please do.

So, If I replace all jl calls in your code with gpu (from the Flux package) and JLArrays.allowscalar(false) with CUDA.allowscalar(false) it should all work.

I’ll do a bit of benchmark testing on the ideas you sent. Thanks again!