Brew a Parallel RNG?

RNGs cannot be parallelized without modification, as the effectiveness of the randomness representation ceases if the sequences are related.
Previous discussions have been found from

But can we use some alternative solutions?

Consider some ancient algorithms that are probably not in use nowadays(most written from the book Numerical Analysis by T. Sauer)

# rng

#random number generators
#Computers are not capable of generating true random numbers
#but can generate sequences with statistical randomness


# 1 Julia's implemented solution
rand(100)

# LCP psuedo solutions

function rng_Park_Miller1998(n;x1=3)
    x=zeros(n)
    x[1] = x1
    u = zeros(n)
    for i in 2:n
        x[i] =mod(16807*x[i-1],2147483647)#7^5 \ 2^31-1 31th mason prime
    end
    u = x./2147483647
    return u
end
# implemented in Matlab 4 1990

rng_Park_Miller1998(100;x1=3)

function rng_randu(n;x1=3)
    x = zeros(n)
    x[1]=x1
    for i in 2:n
        x[i] = mod(65539*x[i-1],2147483648)
    end
    u = x./2147483648
    return u
end

rng_randu(100)


# iterative psudo solutions

function rng_LMap(n;x1=.4)
    x = zeros(n)
    x[1] = x1
    for i in 2:n
        x[i]=1-2*x[i-1]^2
    end
    return((x./2) .+.5)
end

rng_LMap(100)

# self - avoiding solutio

function rng_Halton(n;x1 = 3)
    b = zeros(Int(ceil(log(n)/log(x1))))
    u = zeros(n)
    for j in 1:n
        i = 1
        b[1] = b[1]+1
        while b[i]>(x1-1+eps())
            b[i]=0
            i=i+1
            b[i]=b[i]+1
        end

        u[j]=0
        for k=1:length(b)
            u[j] = u[j]+b[k]*Float64(x1)^(-k);
        end
    end
    return u
end

rng_Halton(100)

It seems not hard if a random seq can be generated with a “root”, so that if we pre-calculate some of the terms as starting values for each thread, then they may end up independent. Is this mistaken?

BTW the code is from my repository

I may be missing something, but I was under the impression that if you either

  1. initialize all the parallel RNGs with a truly random seed (eg from hardware entropy), or

  2. use randjump with a conservatively large step,

these issues are mitigated. See eg

1 Like

Thank you! That’s a very neat solution.

Not sure why I didn’t think of this before, but there’s also SPRNG, which is designed specifically for this use case.

1 Like

I fond what Guy Steele mentioned in the last half of this talk interesting:

IIRC there is a type of SRNG that can be split in such a way that the distribution of resulting streams combined is equivalent to the original stream (or something like that). It would be nice to look into (probably some follow up studies of):

http://supertech.lcs.mit.edu/papers/dprng.pdf
https://ieeexplore.ieee.org/abstract/document/6114424

which were mentioned in the talk.

My understanding is that this would guarantee that parallel computation result with SRNG to be deterministic even when the scheduler is dynamic (as in Julia), if you construct the computation as some combination of map/filter/reduce/etc. (edit: hmm… wait, does it work for any computation?)

2 Likes

also in the coming up 1.3

All operations that affect the random number state ( rand , randn , seed! , etc.) will then operate on only the current thread’s RNG state. This way, multiple independent code sequences that seed and then use random numbers will individually work as expected.

1 Like

Using different multipliers in a linear congruential generator (or a PCG, which uses an LCG to update state) will create unique, non-overlapping streams. You can take advantage of this for SIMD as well.
This is what I have done with VectorizedRNG, and ProbabilityModels, which initializes one per thread.

You can use RNGTest for validating an RNG or LCG multiplier (note that an LCG itself will fail a lot of tests, a PCG is much better in this regard).

2 Likes

My understanding is that making RNG deterministic and parallel is hard when the task scheduler is dynamic as in Julia >= 1.3 (hence my last comment). Is there a packaged solution for Julia? Or maybe people actually don’t need it?

Yes. I am thinking the same. We have to specify scenarios.
If the n on each core is not known then jumpers would not work…

Or should we just simply calculate the length of random sequences applied in each threads and apply the jumpers afterwards :joy:
Well…:sweat_smile: that’s a sacrifice of the performance

Thanks!
So does this mean the ‘jumper’ is already implemented or does it use somewhat different techniques to ensure the between-threads independence? So that
If I
Draw same length of sequences from each thread and then join them together
Or if I
Draw sequences from random length from each sequence and then join them together.
Will the results be a good representation of random sequences?
I guess the previous one is easier to solve. Plz point out if I am wrong

If you just want thread-independent random number (or, sequence), 1.3 should have solved the problem for you (i.e this would be the default behavior). So you can just generate random sequences in different threads and combine afterwards if that’s what you want.

btw, I don’t quite understand what this means.

I am not sure I understand why one would want to do that.

I think it’s highly useful that sum(rand(MersenneTwister(0), 10)) returns the same value as long as I’m using same julia version and setting.

I understand what it implies, but now how that is it so useful.

For example, I have seen people rely on this for testing, but I don’t think it is the right approach (as it is quite fragile to details of the algorithm).

Because getting this is quite difficult, before we focus on this as a desirable property of an RNG in a parallel setting, it would be great to understand the rationale.

the demand(?) maybe more common than you think, for example, I think most CMS(physics) papers would use seed=123456 for quite a few Monte Carole/Inference tools to make the result ‘reproducible’(for debugging and peer review purpose, I only could imagine)

1 Like

Agree

How about debugging some input-dependent bugs? If you have a deterministic program, you can reproduce it quite easily and re-do same execution as many as you want. If you have parallel code that depends on the exact scheduling of the tasks, I can imagine that it is extremely hard (impossible?) to reproduce the same execution reliably.

I do agree that relying on the exact output of RNG-depending code is not the right approach in the testing. That’s because getting same output from code using RNG across library versions is hard. But I don’t think it is directly relevant for getting the same result of the same program with the same input if it only does pure computation. Also, there seems to be a way to do it already. Implementing it probably is a very hard work but I don’t think it is as difficult as improving libraries using RNG without changing the exact output.

That means if I don’t know the exact lengths of each random sequences that is called within each thread.
For example thread one has a process requiring 100 random numbers, thread 2 requiring 120. Both of them are good representations of randomness for themselves as separated event lists -stochastic processes. But if I combine these they might not practically be good representations of true random as a whole.

Is this property algorithm dependent?Maybe I should read about pcg or some other algorithms?

And I think at some point if the jumpers enable us to code and run RNG on general graphic cards… well that’s a different story… maybe forget about this:joy: