rand(1:10) vs Int(round(10*rand())

Why should I use rand(1:N) for an integer N, instead of Int.(round.(rand()*N?)) On my computer, it seems that Int.(round.(rand()*N?)) is about 3 times faster than rand(1:10).

1 Like
  • It’s more succinct
  • It’s more obvious in terms of what it’s doing
  • I trust Julia to make sure that rand(1:N) is bias-free. If your shortcut is indeed “safe,” then I’d imagine the RNG folks would be happy to implement that as the implementation for rand(1:N), so that performance difference might disappear in the future.

To that point, it turns out that Int(round(rand() * N)) is definitely not bias-free:


Well, 1 + floor(Int, rand()*N)) should be bias free and indeed appears to be significantly faster than rand(1:N). But the fact that OP seriously messed up the expression is a decent anecdotal argument for why people should usually use the simpler formulation anyway.


Beautiful demonstration of the underlying fair uniformly distributed random number generator AND rounding intervals. :laughing:

Also see https://github.com/JuliaLang/julia/issues/20582

Yeah, Int(ceil(N*rand())) is seems unbias and is much faster. My code is performance critical, so I need all the speed boosts I can get.

As long as N is not too big then the bias is very small. Biases become visible for big N, for example:

julia> N=10^16; M=10^6; mean(iseven(Int(ceil(N*rand()))) for i in 1:M)

julia> N=10^16; M=10^6; mean(iseven(rand(1:N)) for i in 1:M)

[@bkamins has corrected benchmarks]


You might also have a look at this https://juliasnippets.blogspot.com/2017/11/basics-of-generating-random-numbers-in.html (it is Julia 0.6 based but the reasons for the bias have not changed).

Having said that for very small N in very tight loops I use the approximate formula as the bias is small and it is significantly faster (see the corrected benchmark below - also you have to use 1+floor not ceil as ceil can produce 0):

julia> f1(N=5,M=10^6) = mean(iseven(1+floor(Int, N*rand())) for i in 1:M)
f1 (generic function with 3 methods)

julia> f2(N=5,M=10^6) = mean(iseven(rand(1:N)) for i in 1:M)
f2 (generic function with 3 methods)

julia> @btime f1()
  9.020 ms (0 allocations: 0 bytes)

julia> @btime f2()
  17.681 ms (0 allocations: 0 bytes)

The algorithm here: https://arxiv.org/abs/1805.10941
is faster and is claimed to be unbiased.

EDIT: (Skip this. Corrected and improved code is in my following post)

mytrunc(x) = UInt64((x << 64) >> 64)
function nearlydivisionless(s)
    x = rand(UInt64)
    m = convert(UInt128, x) * s
    l = mytrunc(m)
    if (l < s)
        t = -s % s
        while l < t
            x = rand(UInt64)
            m = convert(UInt128, x) * s
            l = mytrunc(m)
    return Int(m >> 64)

function biased(s)
    return floor(Int, rand() * s)


julia> @btime nearlydivisionless(10);
  7.042 ns (0 allocations: 0 bytes)

julia> @btime biased(10);
  5.188 ns (0 allocations: 0 bytes)

julia> @btime rand(0:9);
  12.698 ns (0 allocations: 0 bytes)

This is indeed very good. Additionally we can specialize the function for small s (fitting in UInt32) to gain even more speed in this case.

1 Like

Yes. In this case the new algorithm appears to be significantly faster than the biased algorithm:

@inline function drawl(s::Integer, maxval, T, T2)
    x = rand(T)
    m = convert(T2, x) * s
    l = m % maxval
    return (l, m)

@inline function _nearlydivisionless(s::Int32)
    nbits = 32
    __nearlydivisionless(s, UInt32, UInt64, UInt64(1) << nbits, nbits)

@inline function _nearlydivisionless(s::Int64)
    nbits = 64
    __nearlydivisionless(s, UInt64, UInt128, UInt128(1) << nbits, nbits)

function nearlydivisionless(s::Integer)
    if s < typemax(UInt32)
        s1 = Int32(s)
        s1 = Int64(s)

@inline function __nearlydivisionless(s::Integer, T, T2, maxval, nbits)
    l, m = drawl(s, maxval, T, T2)
    if (l < s)
        t = maxval % s
        while l < t
            l, m = drawl(s, maxval, T, T2)
    return Int(m >> nbits)

biased(s) = floor(Int, rand() * s)

Here are the benchmarks:

The builtin function:

julia> @btime rand(0:99);
  9.358 ns (0 allocations: 0 bytes)

The biased algorithm:

julia> @btime biased(100);
  4.893 ns (0 allocations: 0 bytes)

The following function checks that 100 < typemax(UInt32) and chooses the routine accordingly. In addtion, it’s a bit faster if the argument is Int32(100) rather than 100.

julia> @btime nearlydivisionless(Int32(100));
  5.209 ns (0 allocations: 0 bytes)

Bypassing the check mentioned above is much faster. Much faster than the biased algorithm as well.

julia> @btime _nearlydivisionless(Int32(100));
  3.698 ns (0 allocations: 0 bytes)

These functions appear to work correctly now. The implementation above could no doubt be improved significantly.

If you read the paper, you’ll find that for n = 100, a single division is performed 2e-6 percent of the time. The other times, no division is performed.

I think choosing the routine based on whether n < typemax(UInt32) is not optimal. As n approaches the maximum value of the type, a single division is performed more often. This may kill the advantage of working with UInt64 rather than UInt128.


What is conclusion of the debate? Can be stated in few sentance?

There is general interest in exploring the method from https://arxiv.org/abs/1805.10941 for use in Julia. The following issue keeps track, though not so much has happened for now: https://github.com/JuliaLang/julia/issues/29004


Thank you, that all I need to know.

*10^16 might be stretching the float 64 precision levels a bit, 70% integer numbers returned

julia> M= 10^6;
julia> x=0; for i in 1:M; global x;
  n = 10^16;
  x=x+ ((n%1)==0.0);
  end; print(x);
julia> 10^16 == Float64(10^16)

The problem is that rand(Float64) effectively generates random fixed-point numbers and then converts them to floats. See https://discourse.julialang.org/t/output-distribution-of-rand/12192.

Here’s another funny example

julia> 10^16 +1.0 == 10^16 -1.0