I’m trying to broadcast a two argument function over slices on the second argument like so:
# using CUDA # todevice = cu todevice = identity x = rand(5) |> todevice a = rand(2, 5) |> todevice function kernel(x, a) a * x # dummy computation end kernel.(x, eachcol(a))
This kind of works on the CPU but produces an array of arrays instead of a 2d array. When trying to do the same on the GPU I get the following error:
ERROR: LoadError: CuArray only supports element types that are stored inline
Which I assume is related to the array of arrays issue.
Do you know if there is a way to make this work on the GPU without array mutation, since I also want to compute gradients using Zygote?