Same random seed, but different random numbers?

Dear all,

I try to generate random number using both randn() and rand(), I set the same random seed, but why they generate different results?

julia> using Distributions, Random

julia> Random.seed!(123)
TaskLocalRNG()

julia> a = rand(Normal(0, 1), 100);

julia> Random.seed!(123)
TaskLocalRNG()

julia> b = randn(100);

julia> a == b
false

1 Like

It works for vectors up to 7 elements long for some reason.

julia> begin
         N = 7
         Random.seed!(123)
         a = rand(Normal(0, 1), N)
         Random.seed!(123)
         b = randn(N)
         a == b
       end
true

Yes. It’s strange. When N>7, the results are different.

It comes from 7 being hardcoded as a specialization here

function $randfun!(rng::Union{Xoshiro, TaskLocalRNG}, A::Array{Float64})
    if length(A) < 7
        for i in eachindex(A)
            @inbounds A[i] = $randfun(rng, Float64)
        end
    else
        GC.@preserve A rand!(rng, UnsafeView{UInt64}(pointer(A), length(A)))

        for i in eachindex(A)
            @inbounds A[i] = $_randfun(rng, reinterpret(UInt64, A[i]) >>> 12)
        end
    end
    A
end

though I’m not sure why randn(N) calls this while rand(Normal(0, 1), N) uses this

function _rand!(rng::AbstractRNG, sampler::Sampleable{Univariate}, A::AbstractArray{<:Real})
    for i in eachindex(A)
        @inbounds A[i] = rand(rng, sampler)
    end
    return A
end

But here we see that for N<7 the implementations seems similar with one element being generated at a time, and if you actually generate a 100 numbers one at a time with the two strategies they are the same, while for N>7 randn will generate stuff in blocks with some other strategy that results in different numbers.

2 Likes

I hope that in Julia 1.8 release version, this problem can be solved.

I am not sure this is a “problem”.
rand() and rand(Normal(0, 1) are not guaranteed (indeed…) to use the same algorithm, so why should we assume they provide the same random streams ? It would be a problem if any of the two provides a different random stream with the same seed…

4 Likes

But I think randn() and rand(Normal(0, 1)) should get the same random numbers with the same seed.
in Julia, both randn() and rand(Normal(0, 1)) generate normally-distributed random number of with mean 0 and standard deviation 1.

When we generate one random number with the same seed, we find that they always get the same random number.

julia> Random.seed!(123)
TaskLocalRNG()

julia> randn()
-0.6457306721039767

julia> Random.seed!(123)
TaskLocalRNG()

julia> rand(Normal(0, 1))
-0.6457306721039767

So I think they should also get the same random numbers when N > 7.

I don’t feel the need to have the assumption that they do generate them in the same way (hence with the same random stream). After all, they are different functions.
It happens that for N<=7 they do produce the same numbers/consume the same random stream from the RNG, but nothing guarantees that this is a general rule (indeed it isn’t).
We shouldn’t ever induce a general rule from a few observations, like in the Bertrand Russell’s methaphor.

8 Likes

Think about it this way. randn(N) and rand(Normal(0, 1), N) are 2 different ways of sampling a standard normal variable; if you take enough samples (large N) from either, you see a normal distribution. They don’t promise to or need to do it in the same order or the same way to accomplish that.

Setting the random seed is intended for reproducing the pseudorandom sampling. If you’re trying to reproduce any behavior, you should run the exact same code. Never substitute any step, even if you’re sure they are equivalent.

6 Likes

By the same logic, you should be even more discontent with

julia> using Random

julia> Random.seed!(123)
TaskLocalRNG()

julia> a = randn(7);

julia> Random.seed!(123)
TaskLocalRNG()

julia> b = randn(8)[1:7];

julia> a == b
false

But as others have said, you cannot reasonably expect the same random stream if you are not running the exact same code. The fact that a specialization exists for N>7 (or N>13 for Mersenne Twisters) is a good sign. It means someone has given serious thought to maximizing performance, which is next to correctness the most important goal.

If any implementation is “at fault”, it’s Distributions’, because their sampler circumvents the specialization in Random.

In any case, I cannot imagine a scenario in which requiring randn == rand(Normal) is sensible.

5 Likes

Thanks for all of your answers. It help me a lot.

2 Likes