Why should I use rand(1:N) for an integer N, instead of Int.(round.(rand()*N?)) On my computer, it seems that Int.(round.(rand()*N?)) is about 3 times faster than rand(1:10).
- It’s more succinct
- It’s more obvious in terms of what it’s doing
- I trust Julia to make sure that
rand(1:N)
is bias-free. If your shortcut is indeed “safe,” then I’d imagine the RNG folks would be happy to implement that as the implementation forrand(1:N)
, so that performance difference might disappear in the future.
Well, 1 + floor(Int, rand()*N))
should be bias free and indeed appears to be significantly faster than rand(1:N)
. But the fact that OP seriously messed up the expression is a decent anecdotal argument for why people should usually use the simpler formulation anyway.
Beautiful demonstration of the underlying fair uniformly distributed random number generator AND rounding intervals.
Also see random sampling from an (abstract)array is slow · Issue #20582 · JuliaLang/julia · GitHub
Yeah, Int(ceil(N*rand())) is seems unbias and is much faster. My code is performance critical, so I need all the speed boosts I can get.
As long as N
is not too big then the bias is very small. Biases become visible for big N
, for example:
julia> N=10^16; M=10^6; mean(iseven(Int(ceil(N*rand()))) for i in 1:M)
0.549885
julia> N=10^16; M=10^6; mean(iseven(rand(1:N)) for i in 1:M)
0.500379
[@bkamins has corrected benchmarks]
You might also have a look at this Julia snippets: Basics of generating random numbers in Julia (it is Julia 0.6 based but the reasons for the bias have not changed).
Having said that for very small N
in very tight loops I use the approximate formula as the bias is small and it is significantly faster (see the corrected benchmark below - also you have to use 1+floor
not ceil
as ceil
can produce 0
):
julia> f1(N=5,M=10^6) = mean(iseven(1+floor(Int, N*rand())) for i in 1:M)
f1 (generic function with 3 methods)
julia> f2(N=5,M=10^6) = mean(iseven(rand(1:N)) for i in 1:M)
f2 (generic function with 3 methods)
julia> @btime f1()
9.020 ms (0 allocations: 0 bytes)
0.400145
julia> @btime f2()
17.681 ms (0 allocations: 0 bytes)
0.399574
The algorithm here: https://arxiv.org/abs/1805.10941
is faster and is claimed to be unbiased.
EDIT: (Skip this. Corrected and improved code is in my following post)
mytrunc(x) = UInt64((x << 64) >> 64)
function nearlydivisionless(s)
x = rand(UInt64)
m = convert(UInt128, x) * s
l = mytrunc(m)
if (l < s)
t = -s % s
while l < t
x = rand(UInt64)
m = convert(UInt128, x) * s
l = mytrunc(m)
end
end
return Int(m >> 64)
end
function biased(s)
return floor(Int, rand() * s)
end
Benchmarks:
julia> @btime nearlydivisionless(10);
7.042 ns (0 allocations: 0 bytes)
julia> @btime biased(10);
5.188 ns (0 allocations: 0 bytes)
julia> @btime rand(0:9);
12.698 ns (0 allocations: 0 bytes)
This is indeed very good. Additionally we can specialize the function for small s
(fitting in UInt32
) to gain even more speed in this case.
Yes. In this case the new algorithm appears to be significantly faster than the biased algorithm:
@inline function drawl(s::Integer, maxval, T, T2)
x = rand(T)
m = convert(T2, x) * s
l = m % maxval
return (l, m)
end
@inline function _nearlydivisionless(s::Int32)
nbits = 32
__nearlydivisionless(s, UInt32, UInt64, UInt64(1) << nbits, nbits)
end
@inline function _nearlydivisionless(s::Int64)
nbits = 64
__nearlydivisionless(s, UInt64, UInt128, UInt128(1) << nbits, nbits)
end
function nearlydivisionless(s::Integer)
if s < typemax(UInt32)
s1 = Int32(s)
else
s1 = Int64(s)
end
_nearlydivisionless(s1)
end
@inline function __nearlydivisionless(s::Integer, T, T2, maxval, nbits)
l, m = drawl(s, maxval, T, T2)
if (l < s)
t = maxval % s
while l < t
l, m = drawl(s, maxval, T, T2)
end
end
return Int(m >> nbits)
end
biased(s) = floor(Int, rand() * s)
Here are the benchmarks:
The builtin function:
julia> @btime rand(0:99);
9.358 ns (0 allocations: 0 bytes)
The biased algorithm:
julia> @btime biased(100);
4.893 ns (0 allocations: 0 bytes)
The following function checks that 100 < typemax(UInt32)
and chooses the routine accordingly. In addtion, it’s a bit faster if the argument is Int32(100)
rather than 100
.
julia> @btime nearlydivisionless(Int32(100));
5.209 ns (0 allocations: 0 bytes)
Bypassing the check mentioned above is much faster. Much faster than the biased algorithm as well.
julia> @btime _nearlydivisionless(Int32(100));
3.698 ns (0 allocations: 0 bytes)
These functions appear to work correctly now. The implementation above could no doubt be improved significantly.
If you read the paper, you’ll find that for n = 100, a single division is performed 2e-6
percent of the time. The other times, no division is performed.
I think choosing the routine based on whether n < typemax(UInt32)
is not optimal. As n
approaches the maximum value of the type, a single division is performed more often. This may kill the advantage of working with UInt64
rather than UInt128
.
What is conclusion of the debate? Can be stated in few sentance?
There is general interest in exploring the method from https://arxiv.org/abs/1805.10941 for use in Julia. The following issue keeps track, though not so much has happened for now: improve performance of `rand(n:m)` · Issue #29004 · JuliaLang/julia · GitHub
Thank you, that all I need to know.
*10^16
might be stretching the float 64 precision levels a bit, 70% integer numbers returned
julia> M= 10^6;
julia> x=0; for i in 1:M; global x;
n = 10^16;
n=n*rand();
x=x+ ((n%1)==0.0);
end; print(x);
700058
julia> 10^16 == Float64(10^16)
true
The problem is that rand(Float64)
effectively generates random fixed-point numbers and then converts them to floats. See https://discourse.julialang.org/t/output-distribution-of-rand/12192.
Here’s another funny example
julia> 10^16 +1.0 == 10^16 -1.0
true