random function in the
Distributions module is very useful for generating a sequence / vector / list of random numbers one by one based on a given random number generator and a statistical distribution. What I need is to generate a sequence of random numbers (say, several millions) one by one and parse it one by one. I can simply generate it by using the random function call, and save the generated random number sequence into a vector. But this may not seem efficient. I was wondering, is there any other way to handle this? Many thanks.
you could use a generator
n = 1_000_000 rg = (rand() for _ in 1:n) for r in rg #do stuff with random number `r` end
Or just a loop?
for i = 1:n r = rand() do_something(r) end
There’s nothing wrong with loops in Julia, unlike other languages where you have to avoid them in critical code.
Thank you for kind reply. The situation I’m facing is that, the
rand() function in the
Distributions module can be optionally supplied with a default first parameter which indicates the random number generator. If I put this rand function with the supplied random number generator, say
MersenneTwister(4), then the rand function will give me the “same” number each time, which is not what I need. Moreover, I need more than one rand function calls like this inside of the same loop. Do you know how to accomplish this? I eventually generate the random numbers I need and save them in a vector and then parse it one by one inside of the for loop. I wonder if there is a better way for this.
I’m not sure to understand, but it will be true only if you put the
MersenneTwister(4) call within the loop. Just assign the rng to a variable before the loop:
rng = MersenneTwister(4) # or rng = MersenneTwister() for a random seed for i = 1:n r = rand(rng) # different value at each iteration do_something(r) end
Thank you for your reply. The thing is, I need more than one
rand function calls inside of the for loop, and to make it more challenging, for each
rand function call, I need to use different random number generators…
I don’t really understand where is your problem… what about creating a second rng outside of the loop, and make the
rand calls as needed within the loop?
julia> using Random julia> rng = MersenneTwister(4) julia> using Distributions julia> for i = 1:10 r = rand(rng,Categorical([0.4,0.6])) println(r) end 2 2 2 2 2 2 1 2 2 2
It’s not what is expected.
What do you expect?
oh. You’re right. I thought the output above is all 2’s, and did not notice that there is a 1 hidden inside.
I’d guess that you might be better off creating that
Categorical distribution object outside of the loop.
Because it’s the same every time and it’s pretty hard for the compiler to prove that it doesn’t need to create a new array and object referring to that array on every iteration.
julia> using Random julia> using Distributions julia> rng1 = MersenneTwister(1) julia> rng2 = MersenneTwister(2) julia> for i = 1:10 r = rand(rng1, Categorical([0.4,0.6])) println("r: ", r) t = rand(rng2, Exponential(1/200)) println("t: ", t) end r: 1 t: 0.0030177027481547076 r: 1 t: 0.009348155469112253 r: 1 t: 0.004486075863176537 r: 1 t: 0.007756887782850566 r: 2 t: 0.007259221922800534 r: 1 t: 0.0038109894089538372 r: 2 t: 0.0015894111269976982 r: 2 t: 0.004725440286443651 r: 1 t: 0.006614692669167275 r: 2 t: 0.0026035963682239554 julia> for i = 1:10 r = rand(MersenneTwister(1), Categorical([0.4,0.6])) t = rand(MersenneTwister(4), Exponential(1/200)) println("r: ", r) println("t: ", t) end r: 1 t: 0.011176055504085384 r: 1 t: 0.011176055504085384 r: 1 t: 0.011176055504085384 r: 1 t: 0.011176055504085384 r: 1 t: 0.011176055504085384 r: 1 t: 0.011176055504085384 r: 1 t: 0.011176055504085384 r: 1 t: 0.011176055504085384 r: 1 t: 0.011176055504085384 r: 1 t: 0.011176055504085384
The above comparison explains my confusing point. I know how it works now. I guess calling the
MersenneTwister() function is like re-setting the global random number generator. Or can anyone explain it more clearly?
Each time you call
MersenneTwister(4) “is like” re-seeding the global RNG, so doing that in the loop will produce invariably the same output. Moreover, it’s very wasteful to create so many new RNGs. By the way, do you really need to create two RNGs for the two distributions? In your example above at least it’s unnecessary.
to make sure they are strictly not related at all…
Are you sure that makes them more independent? I’m no expert on this, but it sounds to me like the type of ‘clever’ overcomplication that could end up making them less independent. You should definitely double check your assumption.
Also, why are you redefining the distributions (to the same values!) on each loop iteration?
As a rule of thumb, create independent RNGs for multithreaded code, otherwise use the same one sequentially.
The most important exception to this is CI, but
@testset Does the Right Thing™ so you don’t need to worry about that either.