Random and threads

gianmariomanca · August 4, 2021, 1:35am

Given the following snippet (assuming more than 1 thread):

const NC = 3000;
const z_vec = zeros(NC);
Threads.@threads for i = 1:NC
    z_vec[i] = randn()
end

What statistical properties we should expect in 1.6.2 for z_vec?
Are the statistical properties expected to be as good as a single call z_vec = randn(NC)?
Would the answer be the same for 1.7?

Basically trying to understand if it’s safe to add Threads.@threads without too much thinking for a quick speedup or I need to get serious about selecting seeds etc.

gianmariomanca · August 4, 2021, 3:09pm

Well, I can always allocate also preallocate the random sequence

const NC = 3000;
const z_vec = zeros(NC);
const rr = randn(NC);
Threads.@threads for i = 1:NC
    z_vec[i] = rr[i]
end

hendri54 · August 4, 2021, 5:04pm

I would like to know the answer, though.
It is not always practical to pre-draw all random variables. I sometimes simulate a number of agents’ histories. Each agent draws quite a few random variables.
My approach is to pre-draw the seeds for the random number generators that are passed to each agent. I am curious whether this introduces any issues (e.g., draws being correlated across agents where they should be independent).

dlakelan · August 4, 2021, 6:32pm

My usual take on this is that if the generation of the seeds is high quality most likely the correlations are irrelevant. For this kind of thing it can be useful to utilize OS cryptorng facilities. Like /dev/urandom

You can try running experiments, by seeding two sequences and comparing the outputs with stats tests over and over. For example use the distance correlation from EnergyStatistics.jl

Sukera · August 4, 2021, 6:50pm

On 1.6, this should basically be the same as for the single threaded version since there’s only one global RNG.

randn works by calling rand internally, it’s just faster because it can do batch processing and doesn’t have to do work by calling rand for each index (it’s a little trickier than that in reality, but that’s the gist of it). They share the underlying normal distribution. You can of course call randn() for each thread individually, but if you need the full random matrix anyway, why not call randn(NC) in the first place and skip the threaded filling?

The upcoming 1.7 has an independent RNG per task, but that doesn’t change anything about the distribution characteristics, which is still normally distributed.

gianmariomanca · August 4, 2021, 7:21pm

@Sukera , I’m not sure if you are reading the question too literally. As @hendri54 mentioned is not always practical to pre-draw all random sequences you need, it can easily get out of hand.

The whole point is “user-friendliness” and reducing cognitive load: one less thing I have to worry about to focus on what I really care about.

you can rephrase the original question as “would Julia do that (or something similar) automatically, with some guarantees, or not?”

If 1.7 has independent RNGs per task, the question seems particularly relevant for 1.7 then. If all task were to start with the same seed, it would be a big issue. I’m assuming they will not, but it would be nice to know something more about the mechanism and the “guarantees”.

dlakelan · August 4, 2021, 7:36pm

I think you should always run Random.seed!() it sounds like in 1.7 this will set the seed of your task-local RNG. The question is how you should do this? I don’t think you should rely on Julia to do it for you. You can however use Random.RandomDevice to get OS level crypto entropy to help you.

sd = rand(Random.RandomDevice(),UInt)
Random.seed!(sd)

should do it

Sukera · August 4, 2021, 7:47pm

That’s not necessary. The task local RNG is seeded from the parent tasks’ RNG automatically on creation of the task.

You can check jl_new_task, rng_split and jl_tasklocal_genrandom in src/task.c for how this is done, but from what I can tell this is secure and preserves all properties of the regular RNG since it’s the same kind of RNG anyway. It also works the exact same way seed! does.

dlakelan · August 4, 2021, 7:58pm

Interesting. Seems likely you can ignore this issue for general purposes, but if I cared about independence in a strong way, I’d still be seeding from crypto source. At the very least you can disclaim “there’s nothing about my seeding method that is under my control”. For stuff that I sometimes do, this is potentially important (ie. Forensics)

Sukera · August 4, 2021, 8:06pm

Random.seed! (I’ve added the link above) already does this, it gets a RandomDevice() as the parent RNG since that’s an RNG we can “trust” on some level, so you don’t have to get it yourself and draw from it manually. (I guess that’s what you were asking about with drawing from /dev/urandom, @gianmariomanca?)

Seeding from a crypto source is not really necessary when writing code that does not require it. In those cases, it’s more important to ensure different properties (such as uniformity or performance) from the regular RNG, which Xoshiro in the upcoming 1.7 should definitely satisfy.

dlakelan · August 4, 2021, 8:10pm

Perfect. that’s great. So if you want crypto random seed it looks like you’d just do

Random.seed!()

Thanks!

Sukera · August 4, 2021, 8:11pm

I haven’t looked into how the first task is spawned in 1.7, but I’d guess that’s how the initial RNG of the first task that runs when you start julia is started is seeded already since the first call to rand() is always different

blackeneth · August 5, 2021, 5:22am

Watch out for the birthday paradox, however.

dlakelan · August 5, 2021, 4:24pm

Yes, however it looks like seed! grabs 4 UInt64s from /dev/urandom, there’s 256 bits of random seed then, and the birthday paradox would start to kick in when you’re spawning on the order of 2^128 threads so basically no issues there.

Topic		Replies	Views
Multithreading and Monte Carlo in 1.7 General Usage multithreading , random	12	906	December 31, 2021
How do I deal with Random Number generation when multithreading Statistics parallel	6	4494	August 31, 2017
Best practices for using random number generators with threads General Usage question	0	424	December 11, 2019
Threads and stdlib Random General Usage	4	492	March 19, 2021
Random numbers and threads General Usage question , multithreading , random	30	3839	March 29, 2022

Random and threads

Related topics