We use overlay method tables during GPU compilation to replace Random.default_rng() to a custom, GPU-friendly RNG: https://github.com/JuliaGPU/CUDA.jl/blob/2ae53761a6a254b98a6689ed0d39781176b245cf/src/device/random.jl#L97. Similarly, just calling rand() in a kernel just works and uses the correct RNG.
Specifically, we use Philox2x32, Switch to Philox2x32 for device-side RNG by maleadt · Pull Request #882 · JuliaGPU/CUDA.jl · GitHub, a counter-based PRNG. The seed is passed from the host, and the counters are maintained per-warp and initialized at the start of each kernel that uses the RNG, rand: seed kernels from the host. by maleadt · Pull Request #2035 · JuliaGPU/CUDA.jl · GitHub. The implementation isn’t fully generic, e.g. you can’t have multiple RNG objects, but it’s pretty close to how Random.jl works.