Random seeds in distributed computing

renatobellotti · October 18, 2022, 8:26am

Hi,

I’m currently trying to seed my RNG for a distributed code. I have tried this snippet, but it does not run reproducibly:

using Distributed
using Random

# Distributed seeding:
# We seed an RNG on the master, then generate seeds for all the workers.
Random.seed!(784327)
seeds = rand(UInt, nprocs());

for (i, seed) in enumerate(seeds)
    println(i)
    println(seed)
    @sync @spawnat i Random.seed!(seed)
end
@everywhere @show rand()

How can I properly seed my RNG in a distributed code?

sylvaticus · October 18, 2022, 12:25pm

The code below doesn’t work on distributed processes but on multiple threads, but perhaps it can help you.
The idea is to have something reproducible independently of the number of threads (or in your case processes) used. This is achieved by giving each thread its own independent RNG, but then the seeding is done on each loop, whatever is the thread that is processing it.

julia> using Test, Random, StableRNGs

julia> RNG = StableRNG(123)
StableRNGs.LehmerRNG(state=0x000000000000000000000000000000f7)

julia> println("** Testing reproducible parallel computation...")
** Testing reproducible parallel computation...

julia> x = rand(copy(RNG),100);

julia> function innerFunction(bootstrappedx; rng=Random.GLOBAL_RNG)
            sum(bootstrappedx .* rand(rng) ./ 0.5)
       end
innerFunction (generic function with 1 method)

julia> function outerFunction(x;rng = Random.GLOBAL_RNG)
           masterSeed = rand(rng,100:typemax(Int64)) # important: with some RNG it is important to do this before the deepcopy of the rngs to guarantee independance from number of threads
           rngs       = [deepcopy(rng) for i in 1:Threads.nthreads()]  # make new copy instances
           results    = Array{Float64,1}(undef,30)
           Threads.@threads for i in 1:30
               tsrng = rngs[Threads.threadid()]    # Thread safe random number generator: one RNG per thread
               Random.seed!(tsrng,masterSeed+i*10) # The seeding depends on the i of the loop not the thread: we get same results indipendently of the number of threads
               toSample = rand(tsrng, 1:100,100)
               bootstrappedx = x[toSample]
               innerResult = innerFunction(bootstrappedx, rng=tsrng)
               results[i] = innerResult
           end
           overallResult = mean(results)
           return overallResult
       end
outerFunction (generic function with 1 method)

julia> # Different sequences..
       @test outerFunction(x) != outerFunction(x)
Test Passed

julia> # Different values, but same sequence
       mainRng = copy(RNG)
StableRNGs.LehmerRNG(state=0x000000000000000000000000000000f7)

julia> a = outerFunction(x, rng=mainRng)
53.29363651190486

julia> b = outerFunction(x, rng=mainRng)
60.75585953344582

julia> mainRng = copy(RNG)
StableRNGs.LehmerRNG(state=0x000000000000000000000000000000f7)

julia> A = outerFunction(x, rng=mainRng)
53.29363651190486

julia> B = outerFunction(x, rng=mainRng)
60.75585953344582

julia> @test a != b && a == A && b == B
Test Passed

julia> # Same value at each call
       a = outerFunction(x,rng=copy(RNG))
53.29363651190486

julia> b = outerFunction(x,rng=copy(RNG))
53.29363651190486

julia> @test a == b
Test Passed

Palli · October 18, 2022, 12:43pm

julia> @everywhere (using Random; Random.seed!(784327); @show rand()) # StableRNG might be even better, i.e. same on 1.6 and 1.7/1.8 and forever
rand() = 0.9779508788735521
      From worker 4:	rand() = 0.9779508788735521
      From worker 2:	rand() = 0.9779508788735521
      From worker 3:	rand() = 0.9779508788735521

julia> @everywhere (using Random; Random.seed!(myid()); @show rand())  # and something like this for different, or use PIDs?

sylvaticus · October 18, 2022, 12:54pm

yes there is a bit of difference here between threads and processes… in Julia thread numbers are not decided in the code but in the way you start Julia, so my concern was to guarantee that the result is the same (i.e. reproducible) whatever the number of threads are; for processes instead you can set them in the code, so you can limit your reproducible scope to a certain number of fixed processes…

renatobellotti · November 17, 2022, 11:14am

Thanks for your answer. I think the second solution will do the trick, though that kind of restricts the kind of seeds you can have. But it should work, so I’m marking this as the answer.

Topic		Replies	Views
Getting Reproducible Results in Distributed Computing with Randomness General Usage question , distributed , random	2	360	September 7, 2023
Random seeds in parallel computing General Usage parallel	5	1871	September 20, 2020
Random seed in multi-core parallel running General Usage question	3	574	June 9, 2020
Random number flow independent from the number of threads General Usage question , multithreading , random	3	491	March 24, 2021
Reproducible multithreaded monte-carlo / Task local random Internals & Design proposal , multithreading , reproducibility	8	1761	March 25, 2020

Random seeds in distributed computing

Related topics