Hi!
I’m trying to broadcast a two argument function over slices on the second argument like so:
# using CUDA
# todevice = cu
todevice = identity
x = rand(5) |> todevice
a = rand(2, 5) |> todevice
function kernel(x, a)
a * x # dummy computation
end
kernel.(x, eachcol(a))
This kind of works on the CPU but produces an array of arrays instead of a 2d array. When trying to do the same on the GPU I get the following error:
ERROR: LoadError: CuArray only supports element types that are stored inline
Which I assume is related to the array of arrays issue.
Do you know if there is a way to make this work on the GPU without array mutation, since I also want to compute gradients using Zygote?