is the default random generation of the Distributions package safe for parallel use? I.e., are multiple streams of random numbers generated on different workers guaranteed to be independent? If not, what would be the best option for parallel random number generation?
I can’t comment specifically on the Distributions implementation, I don’t have time to check out what’s being used under the hood. But I did a lot of work on that issue during my phd and parallel generation of random numbers is a massive problem and is vastly understudied. My short answer solution is to suggest that you use a library called Random123. It’s very good and lends itself to safe parallelisation by design.
PS. I have not axe to grind in this field, just a lot of experience of dealing with the problem in a scientific context.
I believe julia uses MersenneTwister for random number generation
I think so too. But I wasn’t sure. If it’s Mersenne Twister then it’s not officially parallel safe. It depends a little on what you want to use it for (ie the degree of independence between threads) but it’s definitely better not to use it.
Thanks for the link. That’s much more useful than the one I shared!
Thanks! Agreed this is a totally underappreciated problem
I suspect you’re confusing thread-safety with more general parallelizability. Mersenne Twister is thread-safe only if you use different RNG objects (one per thread), which is possible in Julia. However, if you are using “different workers”, I assume you are using Julia’s built-in
addprocs parallelization, and that occurs in different processes, so thread safety is irrelevant.
Different worker processes will indeed generate distinct streams of pseudorandom numbers as long as each process uses a distinct seed. And the default seed is chosen in such a way that it should (with probability essentially 1) be distinct for each process, so you should be okay.
I was being sloppy in my language. I was talking from a mathematical point-of-view. You can of course have parallel instances of Mersenne Twister, and they can be initiated with different seeds. But this does not mean that you will get independent streams of random deviates.
In an ideal world, of course, it would. But the independence of streams of random numbers from the Mersenne Twister algorithm is tightly dependent on what the individual seeds are. For many applications you will get away with it. But for many others you will get unacceptable correlations across streams. MT has internally a very high dimensionality of its state space, but it is understudied what are the conditions which will lead to correlations across instances (apart from the obvious identical seeding situation).
In my own application, I use 10,000 parallel streams with 10^9 calls to each. For 10,000 individual initial conditions (seeds) and such long sequences of calls it’s a huge problem to ensure that all streams are completely uncorrelated. That’s why I went for Random123 instead.
Is this a practical concern or something that we are worried about because there are no analytical results to reassure us? I am asking because what @stevengj suggested above is quite common in science for simulations.
it was to me
I know it’s typically irrelevant (or at least ignored). It just depends on what your stochastic requirements are. And I guess that @kkmann has similar requirements to me based on his reply above.
I would say that if you have a stochastic system in which you only need independence across streams on a given program step then you basically have no issue once you initialise with different seeds. But if you have an independence requirement across instances also in a program stepping direction (eg time) then these issues can be extremely well hidden but lead to massive overall correlations in an interdependent system. I have to admit this issue is completely underappreciated even in my own field (computational neuroscience) but when the numbers get big enough you end up with problems.
This is my favourite topic, I can be a complete bore on it
Hm, I guess your requirements are a bit stricter than what I had in mind. It would just be great to have sb who knows the internals discuss this issue in the parallel section of the Julia documentation as I feel that it is pretty central to many applied projects and Julia claims to be “parallel ready”.
I might propose a mini-project on this for our Julia Meet-up in Berlin in April. If I do I’ll make a pull request with suggestions for the documentation. But right now I’m working on something else.