Generating Random Number from inside Kernel

drjoke · December 30, 2017, 9:19am

Is it possible to generate a random number from within a kernel?

My reading of CURAND.jl looks like curand() must be called from host.

I am implementing Genetic Algorithm. It is becoming impractical to have to pre-generate all random numbers beforehand from host and uploading them to GPU. I run out of memory very quickly.

vchuravy · December 30, 2017, 9:46am

Yes there are several possibilities. The easiest is probably to use the random number generator that is part of GPUArrays should work on the device. (It isn’t documented well, but it should get you started: https://github.com/JuliaGPU/GPUArrays.jl/blob/master/src/random.jl)

@maleadt wrote this a while back and I don’t think it is part of any repository. It is a xorshift implementation.

function xorshift(x::UInt32)::UInt32
    x = xor(x, x << 13)
    x = xor(x, x >> 17)
    x = xor(x, x << 5)
    return x
end

function gpurand(state::UInt32)::Tuple{Float32,UInt32}
    state = xorshift(state)
    res = (state >> 9) | reinterpret(UInt32, 1f0)
    val = reinterpret(Float32, res) - 1f0
    return val, state
end

function gpu_walk_kernel(f, out, N, x, a, b)
    idx = (blockIdx().x-UInt32(1)) * blockDim().x + threadIdx().x

    # random number generation
    _, state = gpurand(idx)
    function rand()
        let state=state # JuliaLang/julia#15276
            val, state = gpurand(state)
            return val
        end
    end

    ...
end

The last idea is to actually use the device version of CURAND.jl and I attempted that in GitHub - JuliaGPU/CUDAnativelib.jl

vchuravy · December 30, 2017, 9:49am

I guess Tim’s version is based on https://github.com/JuliaGPU/CUDAnative.jl/issues/40#issuecomment-277597760

drjoke · December 30, 2017, 12:52pm

@vchuravy thank you very much for your help.

I tried to implement the xor shift version.

using CUDAdrv, CUDAnative

@inline function xorshift(x::UInt32)::UInt32
    x = xor(x, x << 13)
    x = xor(x, x >> 17)
    x = xor(x, x << 5)
    return x
end

@inline function gpurand(state::UInt32)::Tuple{Float32,UInt32}
    state = xorshift(state)
    res = (state >> 9) | reinterpret(UInt32, 1f0)
    val = reinterpret(Float32, res) - 1f0
    return val, state
end

function cuda_random!(r)
    i = threadIdx().x
    idx = (blockIdx().x-UInt32(1)) * blockDim().x + threadIdx().x

    # random number generation
    val, state = gpurand(idx)
    r[i] = val
    return nothing
end

r = Array{Float32,1}(100)
d_r = CuArray(r)
@cuda (1, 100) cuda_random!(d_r)
r = Array(d_r)
println(r)

However getting this.

Float32[6.29425f-5, 0.000125885, 0.000188828, 0.00025177, 0.000314713, 0.000377655, 0.000440598, 0.00050354, 0.000566483, 0.000629425, 0.000692368, 0.00075531, 0.000818253, 0.000881195, 0.000944138, 0.0010072, 0.00107014, 0.00113308, 0.00119603, 0.00125897, 0.00132191, 0.00138485, 0.0014478, 0.00151074, 0.00157368, 0.00163662, 0.00169957, 0.00176251, 0.00182545, 0.00188839, 0.00195134, 0.0020144, 0.00195527, 0.00214028, 0.00208116, 0.00226617, 0.00220704, 0.00239205, 0.00233293, 0.00251794, 0.00245881, 0.00264382, 0.0025847, 0.00276971, 0.00271058, 0.00289559, 0.00283647, 0.0030216, 0.00296247, 0.00314748, 0.00308836, 0.00327337, 0.00321424, 0.00339925, 0.00334013, 0.00352514, 0.00346601, 0.00365102, 0.0035919, 0.00377691, 0.00371778, 0.00390279, 0.00384367, 0.0040288, 0.00409174, 0.00391054, 0.00397348, 0.00428057, 0.00434351, 0.00416231, 0.00422525, 0.00453234, 0.00459528, 0.00441408, 0.00447702, 0.00478411, 0.00484705, 0.00466585, 0.00472879, 0.005036, 0.00509894, 0.00491774, 0.00498068, 0.00528777, 0.00535071, 0.00516951, 0.00523245, 0.00553954, 0.00560248, 0.00542128, 0.00548422, 0.00579131, 0.00585425, 0.00567305, 0.00573599, 0.0060432, 0.00598407, 0.00592494, 0.00586581, 0.00629497]

The number appears quite random. However, not uniformly random between 0 and 1.

It will be too technical for me to implement/fix a random number generator.

sdanisch · December 30, 2017, 1:26pm

Just use the one in GPUArrays, it should work perfectly fine with CLArrays && CuArrays!

Here is an example of how you can use it:

I should have a look if we can integrate xorshift as well… Looks a bit simpler then what I ported!

sdanisch · December 30, 2017, 1:29pm

Here are the important lines:

function kernel_rand!(state, randstate)
    random_number = GPUArrays.gpu_rand(Float32, state, randstate)
    return
end
A = .... # your gpu array, CuArray / CLArray
randstate = GPUArrays.cached_state(A)
gpu_call(kernel_rand, A, (randstate,))

drjoke · December 30, 2017, 3:47pm

@vchuravy and @sdanisch I may have resolved the xorshift version.

@inline function rand_float32(state::UInt32)::Tuple{Float32,UInt32}
    state = xorshift(state)
    res =  state  | reinterpret(UInt32, 1f0)
    val = reinterpret(Float32, res) - 1f0
    return val, state
end

Yields

Summary Stats:
Mean:           0.503629
Minimum:        0.001107
1st Quartile:   0.253072
Median:         0.505304
3rd Quartile:   0.755910
Maximum:        0.999141

For 1,000 numbers. I would not know how else to test for randomness or uniformity. It seems “ok” to me now.

drjoke · December 30, 2017, 3:54pm

I guess the next step now is to figure out how to store the state so that it gets used beyond a kernel call. Otherwise, I would just keep getting same sequence of numbers.

I probably would need to create an array that store a state for each idx value.

sdanisch · December 30, 2017, 8:10pm

Well, you can look at GPUArrays, where i do exactly that…

drjoke · December 31, 2017, 5:55am

My bad,

res =  (state >> 9) | reinterpret(UInt32, 1f0)

Is the correct version. You really need state >> 9 in there.

@sdanisch Thank you for your suggestion. The reason I have not tried GPUArrays yet is because I got started with CUDA on Julia following this guide: Technical preview: Native GPU programming with CUDAnative.jl which rather mentions CUArray.

I will try to explore GPUArrays in the future, but currently it seems to have some learning curve on top of the CUDANative. A good tutorial on how CUDANative and GPUArrays work together would be very helpful.

sdanisch · December 31, 2017, 11:53am

CuArrays inherits from GPUArrays, so you’re basically using it already. For how to use it, you really just need the lines i posted…

sdanisch · December 31, 2017, 12:48pm

here some more documentation of gpu_call: Home · GPUArrays.jl.
Note, that with CuArrays gpu_call will directly dispatch to @cuda… Just with the added advantage, that it will also work with CLArrays.
There is also my blog post explaining some more things: Writing extendable and hardware agnostic GPU libraries | by Simon Danisch | techburst

drjoke · January 2, 2018, 5:13am

@sdanisch thank you very much for your help. I will look into it. The support and enthusiasm of the Julia community is very promising. I already got enough help to get my work done with Julia on the GPU to solve my problem. From now on, it is just refinement and improving efficiency.

Topic		Replies	Views
How to generate a random number in CUDA kernel function GPU question	4	3573	November 20, 2020
Gpu_rand inside @cuda GPU	2	371	November 18, 2020
Random numbers in CUDA GPU	3	3230	March 23, 2019
Kernel random numbers generation entropy / randomness issues GPU question , kernel	25	1231	November 3, 2023
Random numbers in KernelAbstractions GPU question , kernelabstractions	4	1078	February 15, 2022

Generating Random Number from inside Kernel

Related topics