I am trying to make my code Flux code portable across CUDA and AMDGPU. I would like to know if there is a GPU backend-agnostic way to create efficiently random number on the GPU? AMDGPU.randn
is by far the fastest, but I would like to avoid an explicit dependency on AMDGPU
This is what I have tested to far:
julia> @btime a = gpu(randn(Float32,128,128,1,64));
3.609 ms (24 allocations: 4.00 MiB)
julia> @btime a = AMDGPU.randn(Float32,128,128,1,64);
4.848 μs (27 allocations: 960 bytes)
julia> @btime a = Flux.randn32(128,128,1,64);
3.418 ms (3 allocations: 4.00 MiB)
Thanks a lot! This works great.
I am wondering if the following should also work on GPU without scalar indexing:
julia> ts = zeros(Int16,100);
julia> rand!(ts,1:100);
julia> ts = CUDA.zeros(Int16,100);
julia> rand!(ts,1:100);
ERROR: Scalar indexing is disallowed.
Invocation of setindex! resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.
If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] errorscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:151
[3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:124
[4] assertscalar(op::String)
@ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:112
[5] setindex!
@ ~/.julia/packages/GPUArrays/uiVyU/src/host/indexing.jl:58 [inlined]
[6] rand!(rng::TaskLocalRNG, A::CUDA.CuArray{Int16, 1, CUDA.DeviceMemory}, sp::Random.SamplerRangeNDL{UInt64, Int64})
@ Random ~/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/Random/src/Random.jl:273
[7] rand!
@ ~/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/Random/src/Random.jl:268 [inlined]
[8] rand!(A::CUDA.CuArray{Int16, 1, CUDA.DeviceMemory}, X::UnitRange{Int64})
@ Random ~/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/stdlib/v1.11/Random/src/Random.jl:265
[9] top-level scope
@ REPL[37]:1
But it is easy to work-around this.
Random number generation by sampling is quite different from what we support right now, which is only the simple uniform or normal generators. I guess we could special case on a Base.OneTo
distribution which should (?) be easy enough to transform the generated numbers for, but for a more generic solution we wouldn’t be able to use CURAND and I haven’t looked into what changes that would require to the native generator.