Hello!
Probably a noob question, but got to ask
Suppose I have:
using CUDA
N = 10^6
AGPU = CuArray(rand(N))
BGPU = CuArray(rand(N))
ACPU = rand(N)
BCPU = rand(N)
# Why does GPU allocate so much more in-place?
@allocated AGPU .= BGPU
1408 # after first run
@allocated ACPU .= BCPU
32 # after first run
I am surprised that CPU allocates a bit even, but I don’t get the GPU allocation at all? How do I pass values in place without using a kernel?
Kind regards