Hi everyone, I am looking for the most performant way to create a
CuArray where coefficients are
0 everywhere but
1 at specified indices.
An easy way to do that with regular arrays would be
a = randn(1000,1000) imin = argmin(a,dims=1) # coefficients where we want b[i] =1 b = zeros(size(a)) b[imin] .= 1
But it gets trickier with
using CUDA CUDA.allowscalar(false) a = CUDA.randn(1000,1000) imin = argmin(a,dims=1)
One cannot broadcast in a similar way as above, since we have not allowed scalar because of performance issues. One could do
imin = argmin(a,dims=1) |> Array b = zeros(size(a)) b[imin] .= 1 b = b |> CuArray
but this involves some back and forth between the gpu and the cpu, that is not nice.
Any idea of a trick to get around this problem? Cheers!