Hi everyone, I am looking for the most performant way to create a `CuArray`

where coefficients are `0`

everywhere but `1`

at specified indices.

An easy way to do that with regular arrays would be

```
a = randn(1000,1000)
imin = argmin(a,dims=1) # coefficients where we want b[i] =1
b = zeros(size(a))
b[imin] .= 1
```

But it gets trickier with `CuArray`

s

```
using CUDA
CUDA.allowscalar(false)
a = CUDA.randn(1000,1000)
imin = argmin(a,dims=1)
```

One cannot broadcast in a similar way as above, since we have not allowed scalar because of performance issues. One could do

```
imin = argmin(a,dims=1) |> Array
b = zeros(size(a))
b[imin] .= 1
b = b |> CuArray
```

but this involves some back and forth between the gpu and the cpu, that is not nice.

Any idea of a trick to get around this problem? Cheers!