GPU randn way slower than rand?

Hi Tim,

Thanks for the reply.
I understand that testing once in global is not accurate, it’s just that the difference is too large to be normal.

Here the new result following your code

ac = Array{Float64}(undef, 2^20)
ag = cu(ac)
@btime randn!(ac)
@btime CuArrays.@sync randn!(ag)

6.788 ms (0 allocations: 0 bytes)
7.626 s (5242883 allocations: 288.00 MiB)

So the allocations are clearly the problem, but I can’t really figure out the reason.

Even with rand there are some extra allocations

@btime rand!(ac)
@btime CuArrays.@sync rand!(ag)

957.006 μs (0 allocations: 0 bytes)
5.205 μs (38 allocations: 1.48 KiB)

Do you have any idea where the problem might be?

I’m using a 840m on my laptop with 2GB of memory, however, I don’t think array size is the problem either.