[ANN] PhiloxRNG.jl: Generate random numbers on CPU and GPU using the Philox4x32 counter-based RNG

PhiloxRNG.jl is a package for generating random numbers on both the CPU and GPU. It will generate similar numbers on all devices (results are not exactly the same when sampling floating point distributions due to fast math differences).

The underlying algorithm is a 10 round Philox4x32 combined with a fast boxmuller transformation for sampling the normal distribution.

Performance is currently significantly better than randn! Float32, but slower in other cases.

Benchmarks

Julia 1.12.5, CUDA 5.11.0, AMD Ryzen 7 9800X3D, NVIDIA GeForce RTX 3080.

CPU (ns/value, N = 100,000,000)

Function PhiloxRNG.jl Random.jl
rand F32 0.791 0.522
rand F64 1.997 1.052
randn F32 1.009 2.114
randn F64 3.098 1.795

GPU (ns/value, N = 100,000,000)

Function PhiloxRNG.jl CUDA.jl
rand F32 0.006 0.006
randn F32 0.007 0.032

Random123.jl also implements the philox family of RNGs. The main difference is the API. Random123.jl uses a mutable struct AbstractRNG interface, while PhiloxRNG.jl uses pure functions. This can make PhiloxRNG.jl easier to use in some situations.

3 Likes