GPU randn way slower than rand?

Zhiye_Xia · December 3, 2018, 6:07am

Hi Tim,

Thanks for the reply.
I understand that testing once in global is not accurate, it’s just that the difference is too large to be normal.

Here the new result following your code

ac = Array{Float64}(undef, 2^20)
ag = cu(ac)
@btime randn!(ac)
@btime CuArrays.@sync randn!(ag)

6.788 ms (0 allocations: 0 bytes)
7.626 s (5242883 allocations: 288.00 MiB)

So the allocations are clearly the problem, but I can’t really figure out the reason.

Even with rand there are some extra allocations

@btime rand!(ac)
@btime CuArrays.@sync rand!(ag)

957.006 μs (0 allocations: 0 bytes)
5.205 μs (38 allocations: 1.48 KiB)

Do you have any idea where the problem might be?

I’m using a 840m on my laptop with 2GB of memory, however, I don’t think array size is the problem either.

Topic		Replies	Views
Random numbers in [0.f0,1.f0] GPU	2	685	November 10, 2019
Random numbers in CUDA GPU	3	3227	March 23, 2019
Same random sequence on GPU and CPU? GPU question	8	860	September 8, 2021
Why is GPU kernel rand() not as "random" as CPU rand()? GPU question , cuda , kernel	10	500	May 17, 2023
CUDA.randn! allocation GPU	2	578	April 19, 2021