How to initialize/fix the RNG seed on the GPU?

Hi All,

I have a question regarding initializing the random seed within CUDA.jl. I saw in this thread for the CURAND library, however, it seems the CURAND library is now deprecated for CUDA.jl.

I do see there’s CUDA.default_rng(), but perhaps I missing the equivalent method to set such RNG seed? Is there a specific way to initialize a global seed in CUDA.jl?

Edit: I assume this is acceptable as a way to force reproducible numbers in a CUDA kernel?

using CUDA

CUDA.seed!(42)
CUDA.rand(4)

CUDA.seed!(42) # to reset seed? 

I assume the CUDA.seed!() call globally sets the seed for CUDA?

1 Like

From sheer appearance CUDA.seed!(42) did reset some status so its subsequent CUDA.rand(4) yields the same vector.

But I think the real interesting question is:

  • will it also be reproducible in a GPU parallel computing context?

Edit: see #4 post

Looks like you simply cannot generate random numbers in GPU while it’s executing?? not sure

julia> N = 7;

julia> v = CUDA.zeros(N);

julia> function kernel(v)
           CUDA.rand(2)
           return
       end;

julia> @cuda threads=N kernel(v)
warning: linking module flags 'Dwarf Version': IDs have conflicting values ('i32 4' from globals with 'i32 2' from start)
ERROR: ReadOnlyMemoryError()

Okay I managed to did a comparative test, which indicates that GPU is trickier than CPU. See the following comparative results

using CUDA
const J = 99;
const N = 999;
function d!(v, j)
    a = CUDA.rand(N, N)
    A = a'a
    b = CUDA.rand(N)
    x = A\b
    v[j] = hash(collect(x))
    nothing
end
function test()
    v = Vector{UInt64}(undef, J)
    foreach(wait, [Threads.@spawn(d!(v, j)) for j = 1:J])
    hash(v)
end
CUDA.seed!(42)
test() # 0x55085135480dbe5c
CUDA.seed!(42)
test() # 0x454b4373c153b992
CUDA.seed!(42)
test() # 0x1f7e0ab2f4ff15dd

import Random
function d!(v, j)
    a = rand(N, N)
    A = a'a
    b = rand(N)
    x = A\b
    v[j] = hash(collect(x))
    nothing
end
function test()
    v = Vector{UInt64}(undef, J)
    foreach(wait, [Threads.@spawn(d!(v, j)) for j = 1:J])
    hash(v)
end
Random.seed!(42)
test() # 0x128d80af6b06afed
Random.seed!(42)
test() # 0x128d80af6b06afed
Random.seed!(42)
test() # 0x128d80af6b06afed

The issue here is that you’re trying to allocate memory, which is not allowed inside of kernels. Just using rand() works fine.

using CUDA
using CUDA: i32

function rand_kernel!(x)
    i = threadIdx().x + (blockIdx().x - 1i32) * blockDim().x
    while i <= length(x)
        x[i] = rand()
        i += blockDim().x * gridDim().x
    end
    return
end
julia> x = CuArray{Float32}(undef, 1000);

julia> @cuda threads=128 blocks=8 rand_kernel!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.76815176
 0.33985865
 0.6124729
 0.4942757

julia> @cuda threads=128 blocks=8 rand_kernel!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.79725224
 0.9650727
 0.32230678
 0.43226108

If you want the equivalent of CUDA.rand(2) inside of the kernel, you can use StaticArrays.jl:

using StaticArrays

function rand_kernel_2!(y)  # y is an n x 2 matrix
    i = threadIdx().x + (blockIdx().x - 1i32) * blockDim().x
    while i <= size(y, 1)
        y[i, :] .= rand(SVector{2, Float32})
        i += blockDim().x * gridDim().x
    end
    return
end
julia> y = CuArray{Float32}(undef, 1000, 2);

julia> @cuda threads=128 blocks=8 rand_kernel_2!(y); y[1:4, :]
4×2 CuArray{Float32, 2, CUDA.DeviceMemory}:
 0.347162   0.344527
 0.0153273  0.102394
 0.195802   0.12623
 0.0855174  0.838125

julia> @cuda threads=128 blocks=8 rand_kernel_2!(y); y[1:4, :]
4×2 CuArray{Float32, 2, CUDA.DeviceMemory}:
 0.354968  0.884167
 0.304241  0.496749
 0.872224  0.0735521
 0.807519  0.20668

CUDA.seed! does not help here:

julia> CUDA.seed!(42); @cuda threads=128 blocks=8 rand_kernel!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.6673033
 0.6596097
 0.872357
 0.2317881

julia> CUDA.seed!(42); @cuda threads=128 blocks=8 rand_kernel!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.75517017
 0.7023138
 0.2308588
 0.790027

but you can use Random.seed! on the host

julia> Random.seed!(42); @cuda threads=128 blocks=8 rand_kernel!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.10930445
 0.16218866
 0.6950794
 0.41934755

julia> Random.seed!(42); @cuda threads=128 blocks=8 rand_kernel!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.10930445
 0.16218866
 0.6950794
 0.4193475

or inside the kernel

function rand_kernel_seed!(x)
    Random.seed!(42)
    i = threadIdx().x + (blockIdx().x - 1i32) * blockDim().x
    while i <= length(x)
        x[i] = rand()
        i += blockDim().x * gridDim().x
    end
    return
end
julia> @cuda threads=128 blocks=8 rand_kernel_seed!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.4078151
 0.8522431
 0.5209861
 0.35377088

julia> @cuda threads=128 blocks=8 rand_kernel_seed!(x); x[1:4]
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.4078151
 0.8522431
 0.5209861
 0.35377088

This is also described in the documentation.

1 Like