Why should I use rand(1:N) for an integer N, instead of Int.(round.(rand()*N?)) On my computer, it seems that Int.(round.(rand()*N?)) is about 3 times faster than rand(1:10).
- It’s more succinct
- It’s more obvious in terms of what it’s doing
- I trust Julia to make sure that
rand(1:N)
is bias-free. If your shortcut is indeed “safe,” then I’d imagine the RNG folks would be happy to implement that as the implementation forrand(1:N)
, so that performance difference might disappear in the future.
Well, 1 + floor(Int, rand()*N))
should be bias free and indeed appears to be significantly faster than rand(1:N)
. But the fact that OP seriously messed up the expression is a decent anecdotal argument for why people should usually use the simpler formulation anyway.
Beautiful demonstration of the underlying fair uniformly distributed random number generator AND rounding intervals.
Yeah, Int(ceil(N*rand())) is seems unbias and is much faster. My code is performance critical, so I need all the speed boosts I can get.
As long as N
is not too big then the bias is very small. Biases become visible for big N
, for example:
julia> N=10^16; M=10^6; mean(iseven(Int(ceil(N*rand()))) for i in 1:M)
0.549885
julia> N=10^16; M=10^6; mean(iseven(rand(1:N)) for i in 1:M)
0.500379
[@bkamins has corrected benchmarks]
You might also have a look at this Julia snippets: Basics of generating random numbers in Julia (it is Julia 0.6 based but the reasons for the bias have not changed).
Having said that for very small N
in very tight loops I use the approximate formula as the bias is small and it is significantly faster (see the corrected benchmark below - also you have to use 1+floor
not ceil
as ceil
can produce 0
):
julia> f1(N=5,M=10^6) = mean(iseven(1+floor(Int, N*rand())) for i in 1:M)
f1 (generic function with 3 methods)
julia> f2(N=5,M=10^6) = mean(iseven(rand(1:N)) for i in 1:M)
f2 (generic function with 3 methods)
julia> @btime f1()
9.020 ms (0 allocations: 0 bytes)
0.400145
julia> @btime f2()
17.681 ms (0 allocations: 0 bytes)
0.399574
The algorithm here: https://arxiv.org/abs/1805.10941
is faster and is claimed to be unbiased.
EDIT: (Skip this. Corrected and improved code is in my following post)
mytrunc(x) = UInt64((x << 64) >> 64)
function nearlydivisionless(s)
x = rand(UInt64)
m = convert(UInt128, x) * s
l = mytrunc(m)
if (l < s)
t = -s % s
while l < t
x = rand(UInt64)
m = convert(UInt128, x) * s
l = mytrunc(m)
end
end
return Int(m >> 64)
end
function biased(s)
return floor(Int, rand() * s)
end
Benchmarks:
julia> @btime nearlydivisionless(10);
7.042 ns (0 allocations: 0 bytes)
julia> @btime biased(10);
5.188 ns (0 allocations: 0 bytes)
julia> @btime rand(0:9);
12.698 ns (0 allocations: 0 bytes)
This is indeed very good. Additionally we can specialize the function for small s
(fitting in UInt32
) to gain even more speed in this case.
Yes. In this case the new algorithm appears to be significantly faster than the biased algorithm:
@inline function drawl(s::Integer, maxval, T, T2)
x = rand(T)
m = convert(T2, x) * s
l = m % maxval
return (l, m)
end
@inline function _nearlydivisionless(s::Int32)
nbits = 32
__nearlydivisionless(s, UInt32, UInt64, UInt64(1) << nbits, nbits)
end
@inline function _nearlydivisionless(s::Int64)
nbits = 64
__nearlydivisionless(s, UInt64, UInt128, UInt128(1) << nbits, nbits)
end
function nearlydivisionless(s::Integer)
if s < typemax(UInt32)
s1 = Int32(s)
else
s1 = Int64(s)
end
_nearlydivisionless(s1)
end
@inline function __nearlydivisionless(s::Integer, T, T2, maxval, nbits)
l, m = drawl(s, maxval, T, T2)
if (l < s)
t = maxval % s
while l < t
l, m = drawl(s, maxval, T, T2)
end
end
return Int(m >> nbits)
end
biased(s) = floor(Int, rand() * s)
Here are the benchmarks:
The builtin function:
julia> @btime rand(0:99);
9.358 ns (0 allocations: 0 bytes)
The biased algorithm:
julia> @btime biased(100);
4.893 ns (0 allocations: 0 bytes)
The following function checks that 100 < typemax(UInt32)
and chooses the routine accordingly. In addtion, it’s a bit faster if the argument is Int32(100)
rather than 100
.
julia> @btime nearlydivisionless(Int32(100));
5.209 ns (0 allocations: 0 bytes)
Bypassing the check mentioned above is much faster. Much faster than the biased algorithm as well.
julia> @btime _nearlydivisionless(Int32(100));
3.698 ns (0 allocations: 0 bytes)
These functions appear to work correctly now. The implementation above could no doubt be improved significantly.
If you read the paper, you’ll find that for n = 100, a single division is performed 2e-6
percent of the time. The other times, no division is performed.
I think choosing the routine based on whether n < typemax(UInt32)
is not optimal. As n
approaches the maximum value of the type, a single division is performed more often. This may kill the advantage of working with UInt64
rather than UInt128
.
What is conclusion of the debate? Can be stated in few sentance?
There is general interest in exploring the method from https://arxiv.org/abs/1805.10941 for use in Julia. The following issue keeps track, though not so much has happened for now: improve performance of `rand(n:m)` · Issue #29004 · JuliaLang/julia · GitHub
Thank you, that all I need to know.
*10^16
might be stretching the float 64 precision levels a bit, 70% integer numbers returned
julia> M= 10^6;
julia> x=0; for i in 1:M; global x;
n = 10^16;
n=n*rand();
x=x+ ((n%1)==0.0);
end; print(x);
700058
julia> 10^16 == Float64(10^16)
true
The problem is that rand(Float64)
effectively generates random fixed-point numbers and then converts them to floats. See https://discourse.julialang.org/t/output-distribution-of-rand/12192.
Here’s another funny example
julia> 10^16 +1.0 == 10^16 -1.0
true