Why can Flux not reduce this?

Hello

I am trying:

using CUDA
S      = rand(SVector{3,Float32},5)
DST = zeros(SVector{3,Float32},5)
I       = [3,1,2,5,4]

# Works great on CPU

@CUDA.time NNlib.scatter!(+, DST,S,I)
0.000004 seconds
5-element Vector{SVector{3, Float32}}:
[1.3667885, 1.081403, 1.1134366]
[1.6543723, 0.50424564, 1.1448298]
[0.8695842, 1.8623418, 1.398939]
[1.0094697, 0.0052466393, 0.09184897]
[1.9389011, 0.3291297, 1.2308507]


# Bugs out on GPU

 @CUDA.time NNlib.scatter!(+, CuArray(DST),CuArray(S),CuArray(I))
ERROR: InvalidIRError: compiling kernel #scatter_kernel!(typeof(+), CuDeviceVector{SVector{3, Float32}, 1}, CuDeviceVector{SVector{3, Float32}, 1}, CuDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to atomic_cas!)

Anyone knows why?

This is an example, I need it to work on GPU for a more complex case

Kind regards

That seems like a bug in NNlibCUDA.jl. Please file an issue there.

1 Like

Thank you, I filed an issue 3 weeks ago:

I got an answer on there stating why it doesn’t work. Personally, I just thought “it should work” since StaticArrays is such a core part of Julia imo. One can get it to work doing some code like:

  for i = 1:3 #Length of static array element i.e. SVector{**3**,Float32}
        o       = i - 1
        V_dst   = @view reinterpret(eltype(eltype(dst)), dst)[begin+o:3:end]

        V_src_  = @view reinterpret(eltype(eltype(src)), src)[begin+o:3:end]
        V_src   = @view V_src_[1:SYSTEM.MaxValidIndex[]]

        NNlib.scatter!(OP1,V_dst,V_src, @view(SYSTEM.I[1:SYSTEM.MaxValidIndex[]]))
        NNlib.scatter!(OP2,V_dst,V_src, @view(SYSTEM.J[1:SYSTEM.MaxValidIndex[]]))
    end

But I do not think this is the best fix, since I need to call NNlib.scatter!, 6 times to do this and possible lose out on some speed.

Kind regards

As I mentioned on that issue thread, you can turn this into 1 or 2 scatter! cols by using a 2D array instead.

1 Like

Yes, but I really wish to stay using static arrays since it allows me for a relatively fast CPU execution as well. Do you happen to know if there is a way to “view” a vector of static arrays as a 2d matrix?

I only know how to get a single column out as shown above

Kind regards

Got it to work, thank you