Understanding random numbers in a GPU kernel

jwtkeeble · August 22, 2025, 9:55am

Hi All,

I’m trying to understand the use of random numbers inside CUDA kerrnels in Julia. Below I have a simple script to estimate pi, but I have a few questions regarding its implementation.

Is my implementation of random numbers inside the mc_pi_kernel! even correct? The code runs, but if I try to call x = CUDA.rand(Float32, 1) the code crashes with an unsupported call error and I’m not sure why.
I’ve tried using the Random123.jl library to use a counter-based RNG but this also gets anunsupported dynamic function invocation error.

What am I doing wrong here?

Here’s the minimal reproducible example,

using CUDA
using Random123 # CUDA counter-based PRNG? 

function mc_pi_kernel!(results::CuDeviceVector{Int32}, N::Int)
    tid = threadIdx().x + (blockIdx().x - 1) * blockDim().x
    if tid < 1 || tid > N
        return
    end

    x = rand(Float32) # Does this call CPU rand host-side? 
    y = rand(Float32)

    results[tid] = (x*x + y*y <= 1.0f0) ? 1 : 0

    return
end

function estimate_pi(N::Int=10^6)
    d_results = CuArray(zeros(Int32, N))

    # Get configuration threads/blocks 
    kernel = @cuda launch=false mc_pi_kernel!(d_results, N)
    config = launch_configuration(kernel.fun)
    threads = min(N, config.threads)
    blocks = cld(N, threads)
    println("Estimating π with $N samples ($blocks blocks, $threads threads)")

    @cuda always_inline=true threads=threads blocks=blocks mc_pi_kernel!(d_results, N)

    inside = sum(Array(d_results))
    return 4 * inside / N
end

N = 10^9
pi_est = estimate_pi(N)
println("Estimated π with $N samples = $pi_est")

eldee · August 22, 2025, 10:53am

Hi,

Yes, this looks fine. It might be slightly better to use a 32-bit literals (1i32 after using CUDA: i32), and potentially ifelse might be faster than the ternary operator. But I can’t measure any difference, so it probably doesn’t really matter here.

The reason is that this would create a CuVector, i.e. allocate memory, which is not allowed inside of kernels (or at least not in this manner). While these error messages are often hard to interpret, here it does explicitly mention allocating memory:

ERROR: InvalidIRError: (...)
Reason: unsupported call to an unknown function (call to jl_alloc_genericmemory)
Stacktrace:
 [1] GenericMemory

In contrast, rand(Float32) returns a simple scalar. Inside of a kernel, this will automatically reside on the device.

I’m not familiar with this library, but presumably it allocates, is type-unstable, or uses non-isbits structs which are not adapted for the GPU.

By the way, in

you can just use sum(d_results), which will then perform the summation on the GPU, and return the scalar result on the CPU.

jwtkeeble · August 23, 2025, 9:55am

Thanks for the detailed explaination @eldee!

If anyone else knows how to use the Random123.jl library to use counter-based RNGs in Julia, do let me know!

eldee · August 23, 2025, 12:42pm

I think the main issue is that something like Philox2x is a mutable struct (hence allocates when you create it).

You could probably rewrite the Random123.jl code to make it immutable, replacing all mutating functions on the path of rand by versions (also) returning a new Philox2x. In particular, you would then need to use r = Philox2x(); x, r = rand(r). You should also make sure that every thread uses a different seed. But that all sounds like more work than it’s worth .

maleadt · September 1, 2025, 11:15am

CUDA.jl’s device-side RNG (when you’re calling rand in a kernel) is already a Philox2x counter-based RNG from Random123.jl, so I’d just use that.

Topic		Replies	Views
Random numbers in KernelAbstractions GPU question , kernelabstractions	4	1116	February 15, 2022
How to solve error while generating random number inside kernel? General Usage gpu , cuda	4	386	November 17, 2020
How to generate a random number in CUDA kernel function GPU question	4	3625	November 20, 2020
Generating Random Number from inside Kernel GPU	12	5016	January 2, 2018
Random numbers in CUDA GPU	3	3341	March 23, 2019

Understanding random numbers in a GPU kernel

Related topics