GPU backend-agnostic way to create efficiently random number on the GPU

Alexander-Barth · February 20, 2025, 11:11am

I am trying to make my code Flux code portable across CUDA and AMDGPU. I would like to know if there is a GPU backend-agnostic way to create efficiently random number on the GPU? AMDGPU.randn is by far the fastest, but I would like to avoid an explicit dependency on AMDGPU

This is what I have tested to far:

julia> @btime a = gpu(randn(Float32,128,128,1,64));
  3.609 ms (24 allocations: 4.00 MiB)

julia> @btime a = AMDGPU.randn(Float32,128,128,1,64);
  4.848 μs (27 allocations: 960 bytes)

julia> @btime a = Flux.randn32(128,128,1,64);
  3.418 ms (3 allocations: 4.00 MiB)

maleadt · February 20, 2025, 11:30am

Allocate and use rand!.

Alexander-Barth · February 20, 2025, 12:24pm

Thanks a lot! This works great.

I am wondering if the following should also work on GPU without scalar indexing:

julia> ts = zeros(Int16,100);

julia> rand!(ts,1:100);

julia> ts = CUDA.zeros(Int16,100);

julia> rand!(ts,1:100);
ERROR: Scalar indexing is disallowed.
Invocation of setindex! resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] errorscalar(op::String)
   @ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:151
 [3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
   @ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:124
 [4] assertscalar(op::String)
   @ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:112
 [5] setindex!
   @ ~/.julia/packages/GPUArrays/uiVyU/src/host/indexing.jl:58 [inlined]
 [6] rand!(rng::TaskLocalRNG, A::CUDA.CuArray{Int16, 1, CUDA.DeviceMemory}, sp::Random.SamplerRangeNDL{UInt64, Int64})                                                  
   @ Random ~/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/Random/src/Random.jl:273
 [7] rand!
   @ ~/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/Random/src/Random.jl:268 [inlined]                                                                 
 [8] rand!(A::CUDA.CuArray{Int16, 1, CUDA.DeviceMemory}, X::UnitRange{Int64})
   @ Random ~/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/Random/src/Random.jl:265
 [9] top-level scope
   @ REPL[37]:1

But it is easy to work-around this.

maleadt · March 3, 2025, 7:44am

Random number generation by sampling is quite different from what we support right now, which is only the simple uniform or normal generators. I guess we could special case on a Base.OneTo distribution which should (?) be easy enough to transform the generated numbers for, but for a more generic solution we wouldn’t be able to use CURAND and I haven’t looked into what changes that would require to the native generator.

Topic		Replies	Views
Random numbers in CUDA GPU	3	3242	March 23, 2019
Random numbers in [0.f0,1.f0] GPU	2	685	November 10, 2019
Generating Random Number from inside Kernel GPU	12	4957	January 2, 2018
Heterogeneous random seeding GPU question , feature-request	1	53	June 25, 2025
Same random sequence on GPU and CPU? GPU question	8	862	September 8, 2021

GPU backend-agnostic way to create efficiently random number on the GPU

Related topics