I try to generate random number using both randn() and rand(), I set the same random seed, but why they generate different results?
julia> using Distributions, Random
julia> Random.seed!(123)
TaskLocalRNG()
julia> a = rand(Normal(0, 1), 100);
julia> Random.seed!(123)
TaskLocalRNG()
julia> b = randn(100);
julia> a == b
false
It comes from 7 being hardcoded as a specialization here
function $randfun!(rng::Union{Xoshiro, TaskLocalRNG}, A::Array{Float64})
if length(A) < 7
for i in eachindex(A)
@inbounds A[i] = $randfun(rng, Float64)
end
else
GC.@preserve A rand!(rng, UnsafeView{UInt64}(pointer(A), length(A)))
for i in eachindex(A)
@inbounds A[i] = $_randfun(rng, reinterpret(UInt64, A[i]) >>> 12)
end
end
A
end
though I’m not sure why randn(N) calls this while rand(Normal(0, 1), N) uses this
function _rand!(rng::AbstractRNG, sampler::Sampleable{Univariate}, A::AbstractArray{<:Real})
for i in eachindex(A)
@inbounds A[i] = rand(rng, sampler)
end
return A
end
But here we see that for N<7 the implementations seems similar with one element being generated at a time, and if you actually generate a 100 numbers one at a time with the two strategies they are the same, while for N>7randn will generate stuff in blocks with some other strategy that results in different numbers.
I am not sure this is a “problem”. rand() and rand(Normal(0, 1) are not guaranteed (indeed…) to use the same algorithm, so why should we assume they provide the same random streams ? It would be a problem if any of the two provides a different random stream with the same seed…
But I think randn() and rand(Normal(0, 1)) should get the same random numbers with the same seed.
in Julia, both randn() and rand(Normal(0, 1)) generate normally-distributed random number of with mean 0 and standard deviation 1.
I don’t feel the need to have the assumption that they do generate them in the same way (hence with the same random stream). After all, they are different functions.
It happens that for N<=7 they do produce the same numbers/consume the same random stream from the RNG, but nothing guarantees that this is a general rule (indeed it isn’t).
We shouldn’t ever induce a general rule from a few observations, like in the Bertrand Russell’s methaphor.
Think about it this way. randn(N) and rand(Normal(0, 1), N) are 2 different ways of sampling a standard normal variable; if you take enough samples (large N) from either, you see a normal distribution. They don’t promise to or need to do it in the same order or the same way to accomplish that.
Setting the random seed is intended for reproducing the pseudorandom sampling. If you’re trying to reproduce any behavior, you should run the exact same code. Never substitute any step, even if you’re sure they are equivalent.
By the same logic, you should be even more discontent with
julia> using Random
julia> Random.seed!(123)
TaskLocalRNG()
julia> a = randn(7);
julia> Random.seed!(123)
TaskLocalRNG()
julia> b = randn(8)[1:7];
julia> a == b
false
But as others have said, you cannot reasonably expect the same random stream if you are not running the exact same code. The fact that a specialization exists for N>7 (or N>13 for Mersenne Twisters) is a good sign. It means someone has given serious thought to maximizing performance, which is next to correctness the most important goal.
If any implementation is “at fault”, it’s Distributions’, because their sampler circumvents the specialization in Random.
In any case, I cannot imagine a scenario in which requiring randn == rand(Normal) is sensible.