Random and threads

Given the following snippet (assuming more than 1 thread):

const NC = 3000;
const z_vec = zeros(NC);
Threads.@threads for i = 1:NC
    z_vec[i] = randn()
end

What statistical properties we should expect in 1.6.2 for z_vec?
Are the statistical properties expected to be as good as a single call z_vec = randn(NC)?
Would the answer be the same for 1.7?

Basically trying to understand if it’s safe to add Threads.@threads without too much thinking for a quick speedup or I need to get serious about selecting seeds etc.

Well, I can always allocate also preallocate the random sequence

const NC = 3000;
const z_vec = zeros(NC);
const rr = randn(NC);
Threads.@threads for i = 1:NC
    z_vec[i] = rr[i]
end

I would like to know the answer, though.
It is not always practical to pre-draw all random variables. I sometimes simulate a number of agents’ histories. Each agent draws quite a few random variables.
My approach is to pre-draw the seeds for the random number generators that are passed to each agent. I am curious whether this introduces any issues (e.g., draws being correlated across agents where they should be independent).

My usual take on this is that if the generation of the seeds is high quality most likely the correlations are irrelevant. For this kind of thing it can be useful to utilize OS cryptorng facilities. Like /dev/urandom

You can try running experiments, by seeding two sequences and comparing the outputs with stats tests over and over. For example use the distance correlation from EnergyStatistics.jl

On 1.6, this should basically be the same as for the single threaded version since there’s only one global RNG.

randn works by calling rand internally, it’s just faster because it can do batch processing and doesn’t have to do work by calling rand for each index (it’s a little trickier than that in reality, but that’s the gist of it). They share the underlying normal distribution. You can of course call randn() for each thread individually, but if you need the full random matrix anyway, why not call randn(NC) in the first place and skip the threaded filling?

The upcoming 1.7 has an independent RNG per task, but that doesn’t change anything about the distribution characteristics, which is still normally distributed.

2 Likes

@Sukera , I’m not sure if you are reading the question too literally. As @hendri54 mentioned is not always practical to pre-draw all random sequences you need, it can easily get out of hand.

The whole point is “user-friendliness” and reducing cognitive load: one less thing I have to worry about to focus on what I really care about.

you can rephrase the original question as “would Julia do that (or something similar) automatically, with some guarantees, or not?”

If 1.7 has independent RNGs per task, the question seems particularly relevant for 1.7 then. If all task were to start with the same seed, it would be a big issue. I’m assuming they will not, but it would be nice to know something more about the mechanism and the “guarantees”.

I think you should always run Random.seed!() it sounds like in 1.7 this will set the seed of your task-local RNG. The question is how you should do this? I don’t think you should rely on Julia to do it for you. You can however use Random.RandomDevice to get OS level crypto entropy to help you.

sd = rand(Random.RandomDevice(),UInt)
Random.seed!(sd)

should do it

That’s not necessary. The task local RNG is seeded from the parent tasks’ RNG automatically on creation of the task.

You can check jl_new_task, rng_split and jl_tasklocal_genrandom in src/task.c for how this is done, but from what I can tell this is secure and preserves all properties of the regular RNG since it’s the same kind of RNG anyway. It also works the exact same way seed! does.

3 Likes

Interesting. Seems likely you can ignore this issue for general purposes, but if I cared about independence in a strong way, I’d still be seeding from crypto source. At the very least you can disclaim “there’s nothing about my seeding method that is under my control”. For stuff that I sometimes do, this is potentially important (ie. Forensics)

Random.seed! (I’ve added the link above) already does this, it gets a RandomDevice() as the parent RNG since that’s an RNG we can “trust” on some level, so you don’t have to get it yourself and draw from it manually. (I guess that’s what you were asking about with drawing from /dev/urandom, @gianmariomanca?)

Seeding from a crypto source is not really necessary when writing code that does not require it. In those cases, it’s more important to ensure different properties (such as uniformity or performance) from the regular RNG, which Xoshiro in the upcoming 1.7 should definitely satisfy.

2 Likes

Perfect. that’s great. So if you want crypto random seed it looks like you’d just do

Random.seed!()

Thanks!

I haven’t looked into how the first task is spawned in 1.7, but I’d guess that’s how the initial RNG of the first task that runs when you start julia is started is seeded already since the first call to rand() is always different :slight_smile:

Watch out for the birthday paradox, however.

Yes, however it looks like seed! grabs 4 UInt64s from /dev/urandom, there’s 256 bits of random seed then, and the birthday paradox would start to kick in when you’re spawning on the order of 2^128 threads :wink: so basically no issues there.

1 Like