Hi, the random
function in the Distributions
module is very useful for generating a sequence / vector / list of random numbers one by one based on a given random number generator and a statistical distribution. What I need is to generate a sequence of random numbers (say, several millions) one by one and parse it one by one. I can simply generate it by using the random function call, and save the generated random number sequence into a vector. But this may not seem efficient. I was wondering, is there any other way to handle this? Many thanks.
you could use a generator
n = 1_000_000
rg = (rand() for _ in 1:n)
for r in rg
#do stuff with random number `r`
end
Or just a loop?
for i = 1:n
r = rand()
do_something(r)
end
Thereās nothing wrong with loops in Julia, unlike other languages where you have to avoid them in critical code.
Thank you for kind reply. The situation Iām facing is that, the rand()
function in the Distributions
module can be optionally supplied with a default first parameter which indicates the random number generator. If I put this rand function with the supplied random number generator, say MersenneTwister(4)
, then the rand function will give me the āsameā number each time, which is not what I need. Moreover, I need more than one rand function calls like this inside of the same loop. Do you know how to accomplish this? I eventually generate the random numbers I need and save them in a vector and then parse it one by one inside of the for loop. I wonder if there is a better way for this.
Iām not sure to understand, but it will be true only if you put the MersenneTwister(4)
call within the loop. Just assign the rng to a variable before the loop:
rng = MersenneTwister(4) # or rng = MersenneTwister() for a random seed
for i = 1:n
r = rand(rng) # different value at each iteration
do_something(r)
end
Thank you for your reply. The thing is, I need more than one rand
function calls inside of the for loop, and to make it more challenging, for each rand
function call, I need to use different random number generatorsā¦
I donāt really understand where is your problemā¦ what about creating a second rng outside of the loop, and make the rand
calls as needed within the loop?
julia> using Random
julia> rng = MersenneTwister(4)
julia> using Distributions
julia> for i = 1:10
r = rand(rng,Categorical([0.4,0.6]))
println(r)
end
2
2
2
2
2
2
1
2
2
2
Itās not what is expected.
What do you expect?
oh. Youāre right. I thought the output above is all 2ās, and did not notice that there is a 1 hidden inside.
Iād guess that you might be better off creating that Categorical
distribution object outside of the loop.
why?
Because itās the same every time and itās pretty hard for the compiler to prove that it doesnāt need to create a new array and object referring to that array on every iteration.
julia> using Random
julia> using Distributions
julia> rng1 = MersenneTwister(1)
julia> rng2 = MersenneTwister(2)
julia> for i = 1:10
r = rand(rng1, Categorical([0.4,0.6]))
println("r: ", r)
t = rand(rng2, Exponential(1/200))
println("t: ", t)
end
r: 1
t: 0.0030177027481547076
r: 1
t: 0.009348155469112253
r: 1
t: 0.004486075863176537
r: 1
t: 0.007756887782850566
r: 2
t: 0.007259221922800534
r: 1
t: 0.0038109894089538372
r: 2
t: 0.0015894111269976982
r: 2
t: 0.004725440286443651
r: 1
t: 0.006614692669167275
r: 2
t: 0.0026035963682239554
julia> for i = 1:10
r = rand(MersenneTwister(1), Categorical([0.4,0.6]))
t = rand(MersenneTwister(4), Exponential(1/200))
println("r: ", r)
println("t: ", t)
end
r: 1
t: 0.011176055504085384
r: 1
t: 0.011176055504085384
r: 1
t: 0.011176055504085384
r: 1
t: 0.011176055504085384
r: 1
t: 0.011176055504085384
r: 1
t: 0.011176055504085384
r: 1
t: 0.011176055504085384
r: 1
t: 0.011176055504085384
r: 1
t: 0.011176055504085384
r: 1
t: 0.011176055504085384
The above comparison explains my confusing point. I know how it works now. I guess calling the MersenneTwister()
function is like re-setting the global random number generator. Or can anyone explain it more clearly?
Each time you call MersenneTwister(4)
āis likeā re-seeding the global RNG, so doing that in the loop will produce invariably the same output. Moreover, itās very wasteful to create so many new RNGs. By the way, do you really need to create two RNGs for the two distributions? In your example above at least itās unnecessary.
to make sure they are strictly not related at allā¦
Are you sure that makes them more independent? Iām no expert on this, but it sounds to me like the type of ācleverā overcomplication that could end up making them less independent. You should definitely double check your assumption.
Also, why are you redefining the distributions (to the same values!) on each loop iteration?
As a rule of thumb, create independent RNGs for multithreaded code, otherwise use the same one sequentially.
The most important exception to this is CI, but @testset
Does the Right Thingā¢ so you donāt need to worry about that either.