Across many, e.g., 10,000, simulations, generate random data. Each simulation should be independent of the others. Must be able to recreate a particular simulation s without first running previous s-1 simulations.
Ordinarily this tends to get discussed as parallel random number generation, but my application creates all data sequentially in the main thread.
My goal in this post is to get advice about how to do it.
I can think of several alternatives, none of which seem entirely satisfactory.
Simple minded seeding
For each simulation s do seed!(myseed+s).
This is easy and, I think, fairly conventional. But reportedly it’s not entirely reliable if the goal is real independence of the resulting streams.
This “counter” based approach was recommended in several places as being a good fit for this kind of problem. But it has spotty documentation: some links are dead, and some stuff like the actual arguments to the generators are not documented at all, or only mentioned in examples. And I don’t think it’s a drop-in replacement for other julia RNG’s, since it only produces blocks of bits (at least, that is all that’s documented).
Use abc123 to set the seed for the main RNG. I have no idea if this would behave any better than the simple-minded seeding.
I’ve seen some examples that use jump to move the random number stream to a new spot. This is not in the documented interface that I see, but if I jump by something much more than the number of random numbers each simulation draws that might work. Or maybe Future.randjump.
Maybe do the same thing but use a cryptographic hash to ensure that consecutive values of s lead to very different seeds? e.g.
using SHA, Random
# like Random.seed! but use an SHA256 cryptographic hash of seed
cryptoseed!(rng::AbstractRNG, seed::Base.BitInteger) =
Random.seed!(rng, Vector(reinterpret(UInt32, sha256(reinterpret(UInt8, [seed])))))
cryptoseed!(seed::Base.BitInteger) = cryptoseed!(Random.GLOBAL_RNG, seed)
I don’t see why you’d do anything other than the simple/naive thing. In fact, if not for the requirement
I would tell you just to set the seed once at the beginning of the loop.
State of the art RNGs, like the one Julia uses, are good. Unless you are specifically working on some sort of cyber security project (sounds like you are not), I would take these random number streams as pure randomness.
At my company, we would like reproducibility of the random streams in heterogeneous computing environments for a given seed and the same inputs. Meaning, get the same results on a computer with 2 cores, as you get with 40+ cores. We strive for this, although there are other issues at hand.
Our attempt at this (which has worked so far): 1) each simulation is really fast, so we group the simulations into a bucket of N, where N is set once we get a good feel for what is close to optimal, and leave it alone once in production, 2) Loop and spawn the simulation tasks for each bucket. The random seed is set once at the very beginning of the program before looping over the buckets.
The number of buckets are the same no matter how many cores the computer has, therefore there is the same number of tasks spawn.