# How to run multiple threads, all using the same sequence of pseudorandom numbers, in Julia?

Hey!
I would like to run a function multiple times in parallel using the same sequence of random numbers for each call. The only thing that should change between each call is some given parameters. Here is a MWE using `Random.seed!` for seeding that fails at doing what I would like to do (this MWE omits the given parameters to each calls, meaning we do exactly the same thing each time):

``````using Random

# Job to be parallelized, each thread should use the same random numbers sequence
function job()
Random.seed!(0)
for i in 1:2
println("Random number nÂ° \$i in the sequence = \$(rand())")
end
end

job()
end
end
``````

By running this script with two threads, I get the following output:

``````Random number nÂ° 1 in the sequence = 0.8236475079774124
Random number nÂ° 2 in the sequence = 0.9103565379264364
Random number nÂ° 1 in the sequence = 0.8236475079774124
Random number nÂ° 2 in the sequence = 0.9103565379264364
Random number nÂ° 1 in the sequence = 0.8236475079774124
Random number nÂ° 2 in the sequence = 0.16456579813368521  # Should be 0.9103565379264364
``````

The 3rd run provides a different 2nd random number of the sequence from the two first runs. I guess this is related to the facts that I require more calls to `job()` than the available number of threads and that I am only setting the seed of the global RNG. How to ensure that each run gets the same sequence of random numbers ?

I found a solution by creating a local RNG at the beginning of each job, rather than setting the seed of the global RNG:

``````using Random

# Job to be parallelized, each thread should use the same random numbers sequence
function job()
mt = MersenneTwister(0)
for i in 1:2
println("Random number nÂ° \$i in the sequence = \$(rand(mt))")
end
end

job()
end
end
``````

However, I have concerns regarding this solution:

• Isnât it heavy in terms of memory usage to create separate RNGs?
• It implies explicitly providing the RNG as argument to all `rand` calls:
• This may be cumbersome in terms of code writing
• How can we handle this when calling external libraries that do not permit this explicit call?

It shouldnât be heavy compared to your job. If the multiplication of RNGs is too heavy, so should be use of multiple threads, as they have overhead.

RNG sizes in bytes
``````julia> using Random

julia> Base.summarysize(MersenneTwister(0))
19380

julia> Base.summarysize(Xoshiro(0))
32
``````

Which version are you using? In 1.7, the default RNG will change from MersenneTwister to xoshiro, which has 1) smaller state and 2) can iterate much faster. Additionally, there will be a per-thread RNG. See HISTORY.md for more (as 1.7 is not yet released, the docs donât have a segment of release notes for it right now).

Also, the stream of random numbers is permitted to change between minor versions, so relying on them to be exact is a bad idea (not to mention changes in numerical algorithms leading to differences due to reordering of floating point operations etcâŚ `isapprox`/`â` (`\approx`) is best for floating point comparisons, donât use `==` unless you really do want exact binary representation).

``````julia> println(rand(Xoshiro(0)), ' ', rand(MersenneTwister(0)))
0.20591944549416796 0.8236475079774124
``````

If your code is sensitive to random inputs and you want to account for that, yes, thatâs what you should do. Making implicit state (the RNG object) explicit is a good idea.

External how? If youâre calling into e.g. C or Fortran libraries that donât permit setting a seed or giving a specific RNG object, youâre out of luck with trying to control them in either case, no matter whether you can âfixâ the julia side or not.

Iâm not sure why you want to fix the RNG between jobs in the first place though. Relying on specific random numbers instead of the properties of the underlying distribution of random numbers is usually a pretty bad idea, because it can hide algorithmic and numerical errors and lead to overspecialization for certain algorithms.

1 Like

I am using 1.6. I think per-thread RNG is a good thing, at least for my application. I would be interested in knowing how to set the seeds so that each thread relies on the same stream of random numbers.

I mean external Julia library, but then I guess I could just fork a project and modify for instance a function I would like to use to be able to take an RNG as argument.

Overspecialization in the sense you mean is exactly what I am looking for. My parallelized job takes parameters as argument and I must be able to reproduce the exact same results when I run it twice with the same input parameters. Further, I need different parameters input to be run on the same stream on random numbers for comparison needs. This is why the stream of random numbers used in the job should remain constant.

More details: the issue I described is part of an iterative function. Say we have a pool of `n` different combinations of input parameters. First, we run the job `n` times in parallel and we get `n` âresultsâ. At the next iteration, the pool is increased with `N < n` new combinations of parameters. Here we run the job with those `N` new combinations and we re-run the job with some of the previous parameters so that we again have `n` jobs running in parallel and get `n` results. At that point, I could practically avoid to re-run the job on the already tested parameters, but for the different input parameters to be compared I need the random streams to be the same. Re-running with the same parameters is more like a sanity check for my program.

There is the unexported `Random.default_rng()`, which will return the task local RNG object. If you `seed!` that, youâll seed the RNG for that task (not thread!). The thread itself doesnât have an RNG, but the task that runs on a thread does. That only works in 1.7 though - 1.6 still has one global RNG, so passing them around is a necessity there. In either case, it will be better and easier to maintain later on (e.g. with a new julia versionâŚ) when you pass RNG objects around explicitly. Itâs not expensive to do so.

To me, if a library thatâs relying on RNG doesnât allow me to pass in an RNG of my own and instead relies on the global RNG, thatâs a badly designed library. It prevents me from using the library in all sorts of parallel fashions that require putting in different RNGs.

Sounds like you really want to give an explicitly initialized RNG to each of your jobs instead of relying on the task-local RNG to not change. You donât necessarily know how the task local RNGs are affected without controlling for that. If you spawn subtasks (e.g. by using the wonderful Tullio.jl and its internal threading, which spawns tasks), that may interfere with the task local RNG in ways you didnât expect. Passing the RNG you rely on explicitly guards you from that.

What do you really want to compare, what are you testing? Usually people want to (or rather, should imo) test whether an algorithm is indifferent to changes in the stream of random numbers. What youâre describing sounds like youâre trying to find out which changes in parameters (donât) influence how many times random numbers are needed, which to me seems like an odd thing to measure.

There are a number of threads about the stability of the random number stream, but Iâd really recommend rethinking your approach if you rely on the number stream to be 100% the same. Do you have other properties of your job output you could check? Properties that should hold no matter which exact numbers are produced by the random number stream? If so, I recommend checking those instead of trying to control randomness (which kind of defeats the purpose of randomness).

1 Like

I am running an evolution strategy, which could be described with this pseudo-code:

``````Initialize:
pop = population of parameters (genomes) of size N # A list of N genomes, one genome is an array of parameters
K = number of generations
U = number of elite genomes (those achieving best scores) kept between generations

for i in 1:K
# Run N jobs in parallel to associate each genome to a score (fitness)
evaluate(pop)

# Select the U genomes with the best scores
elite = select_best(pop)

# Create a new population composed with the U elites and N-U genome children obtained by mutation
pop = mutate(elites)
end
``````

In my case, the `evaluate` function (the one that is parallelized) features randomness. As you said before, it is desirable not to reproduce the exact same stream of random numbers to obtain a solution of the evolution strategy method that is robust to this randomness. However, practically, the application is complex and I need to tune some parameters on the easier problem where the stream of random numbers is kept constant. I really need it to be the same on the `N * K` evaluations.

see the blog

``````const x = zeros(10,Threads.nthreads())

end
x
``````

If I understood well, this first thread use creates an independent thread RNG, here seeded at 20. How can you reset the seed in order to repeat the random numbers stream in further use of the same thread? Would it be with a second call to `Random.seed!(20)`? I got different random sequences when using this in the first MWE of this post.

sorry, my bad, just delete that block.

my original thought was, since each thread has its own local rng, command each thread to seed at 20 should just sync rngs across thread, and further rand on each thread should generate same sequence since effectively they have the same rng. While that does not work out.

so the work around is create rngs for each thread on the heap, each has the same seed.
then for each thread, pick its own rng based on threadid, then it just works.

Iâm familiar with genetic algorithms, but this is where I donât follow. Is this some sort of pre-training? Why does this pretraining not work when the stream of numbers is changing? Do you have (at least some) control over the code youâre actually running evolution on?

You will have to reset them for each invocation of your `job()`, yes. This will be made easier by passing in RNGs explicitly (conceptually the same as the vector of RNGs proposed by @L_Adam, but itâs again better to be explicit about passing them around).

If the code youâre running is oblivious to RNGs being passed in, Iâm afraid youâre out of luck trying to control them (as mentioned above, if they spawn their own tasks they will have their own RNG in them in 1.7. You wonât have access to that, even if you seed your own task local RNG).

It can be seen as pretraining. I would like to set some hyper-parameters in an easier setting. This easier setting is when the evaluation is âdeterministicâ, meaning that the code behind the `evaluate` function will always use the same stream of random numbers.

It does work with a changing stream of numbers. Just that I wand to keep it constant to ease the hyper-parameter problem setting. I have a good control over the code, it is mostly Julia. More in details, the individuals I am training are agents playing Atari games and I am using the ArcadeLearningEnvironment.jl Julia wrapper to the ALE C++ library for evaluation. This evaluation is done by interfacing an individual with an Atari game and the fitness is its score.