Cool, I didn’t see that! Quite sweet addition to my initial unprincipled PR.
However, I fear that your construction is somewhat broken.
The problem is that you can easily construct task trees that imply linear relations between xoshiro states. That would not be a problem if you advanced the internal xoshiro state in between.
However, xoshiro is designed such that extracting a single number does almost no output mixing; it basically dumps one of its internal states, and then updates the internal state.
This is very good for superscalar CPU: It is expected that the rand
output will be required before the next rand
output, so the CPU can schedule such that the output is available immediately for further computations, and it can update the internal state in the background, in parallel to whatever the user wants to do with the rand output.
So, first, to demonstrate (on 1.9) that linear relations are bad for the first output: Suppose we have 4 xoshiro instances with the linear relation A+B=C+D. Then we get:
julia> xoro0 = Random.Xoshiro(); xoro1=Random.Xoshiro(); xoro2=Random.Xoshiro(); xoro3 = Random.Xoshiro(xoro0.s0 + xoro1.s0 - xoro2.s0, xoro0.s1 + xoro1.s1 - xoro2.s1, xoro0.s2 + xoro1.s2 - xoro2.s2, xoro0.s3 + xoro1.s3 - xoro2.s3);
julia> r0=rand(xoro0, UInt64); r1=rand(xoro1, UInt64); r2=rand(xoro2, UInt64); r3=rand(xoro3, UInt64); r0 + r1 - r2 - r3
0xffffffffff800000
julia> r0=rand(xoro0, UInt64); r1=rand(xoro1, UInt64); r2=rand(xoro2, UInt64); r3=rand(xoro3, UInt64); r0 + r1 - r2 - r3
0x5464e464eb865237
julia> r0=rand(xoro0, UInt64); r1=rand(xoro1, UInt64); r2=rand(xoro2, UInt64); r3=rand(xoro3, UInt64); r0 + r1 - r2 - r3
0xc9ec2b54e02cb66d
We can repeat that:
julia> xoro0 = Random.Xoshiro(); xoro1=Random.Xoshiro(); xoro2=Random.Xoshiro(); xoro3 = Random.Xoshiro(xoro0.s0 + xoro1.s0 - xoro2.s0, xoro0.s1 + xoro1.s1 - xoro2.s1, xoro0.s2 + xoro1.s2 - xoro2.s2, xoro0.s3 + xoro1.s3 - xoro2.s3);
julia> r0=rand(xoro0, UInt64); r1=rand(xoro1, UInt64); r2=rand(xoro2, UInt64); r3=rand(xoro3, UInt64); r0 + r1 - r2 - r3
0x0000000000000000
julia> r0=rand(xoro0, UInt64); r1=rand(xoro1, UInt64); r2=rand(xoro2, UInt64); r3=rand(xoro3, UInt64); r0 + r1 - r2 - r3
0xf7d6406723374553
julia> r0=rand(xoro0, UInt64); r1=rand(xoro1, UInt64); r2=rand(xoro2, UInt64); r3=rand(xoro3, UInt64); r0 + r1 - r2 - r3
0xc4b51c8b2237bb0e
Ok, how do we build a task tree that has linear relations? Quite easily, thanks to your excellent documentation – we simply need to use the vectors Base + (0,0) + Base + (1,1) = Base + (1,0) + Base + (0,1)
.
So, now running on 1.10
(update running extra for this demo):
julia> function badXoro()
fut = Threads.@spawn begin
f11 = Threads.@spawn rand(UInt64)
_r10 = rand(UInt64)
(fetch(f11), _r10)
end
r11, r10 = fetch(fut)
f01 = Threads.@spawn rand(UInt64)
r01 = fetch(f01)
r00 = rand(UInt64)
r01 + r10 - r00 - r11
end
badXoro (generic function with 1 method)
julia> badXoro()
0x0000000000000000
Do we want to split this off into a new thread? Or move to github?
PS. What I would conceptionally do is:
- When forking a task, increment rngState[4] of the parent.
- When forking a task, copy its parent’s rngState[0:4] and then apply a random permutation or random function on rngState[0:4], interpreted as a 320 bit unsigned integer.
This should be reasonable to prove good in the random oracle model.
Now, we have a mere turing machine, no random oracle available. So we approximate the random oracle by e.g. a cryptographic function. Maybe a reduced round chacha, or sha or aes variant. This only gets called on task forking, so we can spare some cycles (aes is super fast if we use the hardware support that almost all modern CPU have; but it is a royal pain on build systems to do that for all targets).
If you want a minimally invasive unprincipled quickfix, then keep everything, but advance the child RNG. Once should (?) be enough, but my intuition says to advance 4 times.
Btw, good choice to add
the PCG output instead of xor
ing it. Looks much better with respect to interactions with xoshiro.
PPS. I think we should also take a look at our testing infra. This kind of relation should have been caught by tests – it is a very obvious linear relation in the output. The fact that the tests didn’t catch this implies that they are lacking somewhere.
PPPS. I took a look at the papers you linked. Unfortunately this fish stinks from the head, so to say.
The standard definition of “reasonable PRNG” is a map from finite words in SEED(u) EXTRACT+
to e.g. UInt64. That is, given a seed u
and N>0 extractions, we get a number (the last number we extracted). This is a good PRNG if it is practically indistinguishable from a random function.
For the original model where parent state is mutated on forking, we had such a map from SEED(u) (PARENT | CHILD | EXTRACT)* EXTRACT
.
You added the requirement that forking a task does not change the random stream; so we instead have a map from
SEED(u) ( (PARENT | CHILD | EXTRACT)* CHILD | (CHILD|EXTRACT)* )EXTRACT+`
i.e. we exclude words that end in PARENT EXTRACT+
because for these the last PARENT
call should not influence the random stream.
These maps should now be indistinguishable from random oracle.
This property breaks with my above example, because of the linear relation
rand(S E) + rand(S C C E) == rand(S P C E) + rand(S C E)
The authors of your linked paper Deterministic Parallel Random-Number Generation for Dynamic-Multithreading Platforms
in the PR only consider pairwise correlations between random streams.
Their proof may or may not work, I don’t care – the problem is a catastrophic 4-way correlation. I must admit that I am quite shocked that they did not even mention the possibility of higher order correlations, especially since they acknowledge input from Rivest & Shor, and these two should know better than to stop at 2-way correlations (or should at least know that this is a super important limitation to mention!).