rand
documentation says it draws iid uniformly distributed values in the half-open unit interval [0,1) which includes zero but… Is rand
ever throwing a zero? I have tried several times with minimum(rand(1_000_000_000))
and it is always strictly greater than zero.
Think about how rare this would be from a probabalistic perspective. All decimal combinations are possible down to machine precision? Without looking at the code I would say this is an extremely rare thing to look for!
That is exactly why I think it is pointless to say that rand
draws iid numbers in the half-open unit interval [0,1) including zero, I was expecting just the open interval (0,1) as usual, like in some other programming languages.
Specifically, there are 2^53 floating point numbers in [0,1)
, so you would expect to draw ~10^15 before you expect a 0. That said, since 0 is a possible output, it matters a lot, since you should for example not write 1/rand()
, since that could occasionally get a divide by zero error.
Being that the case, that is what surprises me: why including zero? And so, why not including 1?
See this discussion if you want the 1 instead:
As for the reason: various conventions exist, this is a common one. It is somewhat convenient: the actual value is generated in [1, 2)
, then adjusted. You may be interested in the source, eg
If my math is correct, and I didn’t introduce float point imprecision, I think the probability of drawing at least one zero even in 100 million draws is fairly small.
N = 1_000_000_000 #Number of random draws
logp = log(1-eps()) #Log probability not drawing zero on a given trial
t = logp*N #Log probability of not drawing zero in N trials
p = 1-exp(t) #The probability of drawing at least 1 zero in N trials
julia> p
2.2204458027808016e-7
Someone else who is better at this might chime in.
Well, this seems like a disadvantage for probabilistic simulation since it needs iid uniform values on the open unit interval (0,1), that is excluding both 0 and 1. What would be the fastest way to eliminate the possibility of a zero with rand?
Don’t know about fastest, but this is a quick workaround:
function rand_interior(rng, T)
while true
z = rand(rng, T)
iszero(z) || return z
end
end
Note that this of course depends on the kind of simulation you are doing: some algorithms are totally fine with 0
, some are not, some would want (0, 1] instead for theoretical correctness (may not matter in practice). It is impossible to design for all use cases, but as long as the specs are clear you should be able to obtain the desired kind of output.
Keep in mind that if you’re using Float64s, then the probability of rand() producing an exact zero is likely far less than other important bits getting flilpped in non-ECC memory…
I can assure you that zero is possible
for a decimal floating point system (as oppose to a binary floating point system)
function PDFP_randomPDFPbetweenZeroandOne()
DEFAULT_PRECISION[] > 0 ? newmantissa = rand(0:9,DEFAULT_PRECISION[]) : newmantissa = []
# if the first digit is zero
if DEFAULT_PRECISION[] > 0 && newmantissa[1]==0
if sum( map(Int64,newmantissa) ) == 0
# if all the digits are zero then
# return zero
return PDFP(0,0,newmantissa)
else
# Otherwise normalise the number
return PDFP_normalise( PDFP(0,-1,newmantissa) )
end
end
# Othersie return the number with the exponent set to -1
PDFP(0,-1,newmantissa)
end
Here is an example for a one digit mantissa decimal floating point system
julia> using PDFPs
julia> PDFP_setDefaultPrecision(1)
1
julia> a = [ PDFP_randomPDFPbetweenZeroandOne() for x in 1:20 ]
20-element Array{PDFP,1}:
PDFP(0, -1, [3])
PDFP(0, -1, [8])
PDFP(0, -1, [2])
PDFP(0, -1, [5])
PDFP(0, -1, [3])
PDFP(0, -1, [8])
PDFP(0, -1, [5])
PDFP(0, -1, [7])
PDFP(0, -1, [9])
PDFP(0, -1, [8])
PDFP(0, 0, [0])
PDFP(0, -1, [3])
PDFP(0, -1, [8])
PDFP(0, -1, [2])
PDFP(0, -1, [5])
PDFP(0, -1, [1])
PDFP(0, -1, [6])
PDFP(0, -1, [6])
PDFP(0, -1, [9])
PDFP(0, -1, [6])
julia> map(PDFP_toCommonString,a)
20-element Array{String,1}:
"0.3"
"0.8"
"0.2"
"0.5"
"0.3"
"0.8"
"0.5"
"0.7"
"0.9"
"0.8"
"0"
"0.3"
"0.8"
"0.2"
"0.5"
"0.1"
"0.6"
"0.6"
"0.9"
"0.6"
Once you get to multiple digits mantissa, it is less likely to happen
julia> using PDFPs
julia> PDFP_setDefaultPrecision(4)
4
julia> a = [ PDFP_randomPDFPbetweenZeroandOne() for x in 1:20 ]
20-element Array{PDFP,1}:
PDFP(0, -1, [2, 3, 3, 9])
PDFP(0, -1, [6, 0, 9, 4])
PDFP(0, -1, [7, 6, 3, 6])
PDFP(0, -1, [9, 0, 7, 5])
PDFP(0, -1, [1, 5, 9, 4])
PDFP(0, -1, [2, 7, 2, 7])
PDFP(0, -1, [7, 2, 0, 7])
PDFP(0, -1, [5, 5, 9, 8])
PDFP(0, -1, [2, 5, 3, 1])
PDFP(0, -1, [1, 9, 0, 6])
PDFP(0, -1, [1, 7, 6, 0])
PDFP(0, -1, [5, 8, 2, 7])
PDFP(0, -1, [1, 6, 6, 8])
PDFP(0, -1, [2, 3, 1, 3])
PDFP(0, -2, [9, 3, 7, 0])
PDFP(0, -1, [6, 7, 6, 4])
PDFP(0, -1, [3, 9, 6, 5])
PDFP(0, -1, [6, 6, 3, 5])
PDFP(0, -1, [3, 0, 2, 6])
PDFP(0, -1, [8, 5, 7, 8])
julia> map(PDFP_toCommonString,a)
20-element Array{String,1}:
"0.2339"
"0.6094"
"0.7636"
"0.9075"
"0.1594"
"0.2727"
"0.7207"
"0.5598"
"0.2531"
"0.1906"
"0.1760"
"0.5827"
"0.1668"
"0.2313"
"0.09370"
"0.6764"
"0.3965"
"0.6635"
"0.3026"
"0.8578"
Float64 binary floating point system has an equivalent of 15.8 decimal mantissa digits. You have more chances of winning powerball with me than you have of coming up with a zero.
Sampling from the range [0,1)
is pretty standard. What languages do something else?
This is true but these powerball tickets are very cheap. If you’re writing robust code you do need to account for the possibility of getting zero.
if you prefer no zeros and no ones
(and, for simplicity, only want Float64s)
function openrand()
result = rand()
while iszero(result)
result = rand()
end
return result
end
In R runif(n, 0, 1)
will not generate either of the extreme values 0 or 1 (according to R’s documentation), and from a probabilistic point of view, it is better this way for use with the quantile function in probabilistic simulation.
Most RNGs return numbers between*
julia> @inline mask(x, ::Type{Float64}) = reinterpret(Float64,(x & 0x000fffffffffffff) | 0x3ff0000000000000)
mask (generic function with 1 method)
julia> mask(typemin(UInt64),Float64) - 1.0
0.0
julia> mask(typemax(UInt64),Float64) - 1.0
0.9999999999999998
Instead of calculating u - 1
, where u
is uniformly distributed in [1.0,2.0), we could calculate
julia> muladd(mask(typemin(UInt64),Float64),nextfloat(-1.0),prevfloat(2.0))
0.9999999999999999
julia> muladd(mask(typemax(UInt64),Float64),nextfloat(-1.0),prevfloat(2.0))
2.2204460492503128e-16
?
Other than dropping zero, we’re not really losing anything from the bottom end of the distribution.
julia> mask(one(UInt64),Float64) - 1.0
2.220446049250313e-16
while the maximum value returned is now prevfloat(1.0)
instead of prevfloat(1.0,2)
.
*For clarification, this is because those RNGs generate sequences of bits, ie, rand(UInt64)
. So the following will work (after using Random
and defining the mask function):
myrng(rng = Random.GLOBAL_RNG) = muladd(mask(rand(rng,UInt64),Float64),nextfloat(-1.0),prevfloat(2.0))
From this PR:
https://github.com/JuliaLang/julia/pull/33251
All we actually need to do is use prevfloat(one(T))
instead of one(T)
.
julia> @inline mask(x, ::Type{Float64}) = reinterpret(Float64,(x & 0x000fffffffffffff) | 0x3ff0000000000000)
mask (generic function with 1 method)
julia> mask(typemin(UInt64), Float64) - prevfloat(one(Float64))
1.1102230246251565e-16
julia> mask(typemax(UInt64), Float64) - prevfloat(one(Float64))
0.9999999999999999
Now we would have samples from (0,1).
To answer my own question, Julia, C, C++, Fortran, Python, Perl and Ruby sample from [0,1)
:
- drand48(3) - Linux manual page
- RANDOM_NUMBER - The GNU Fortran Compiler
- random — Generate pseudo-random numbers — Python 3.10.6 documentation
- rand - Perldoc Browser
- Class: Random (Ruby 2.4.0)
Matlab and R sample from (0,1)
:
Mathematica samples from [0,1]
:
In any case, Julia is not in bad company here, and arguably sampling from [0,1)
is the closest to a standard behavior in this area.
Just to add to the list by @StefanKarpinski, Excel (and also other standard spreadsheets like Google Sheets) is also in Julia’s camp.
So if we judged by the number of people in the world that routinely use one of these definitions I would bet that [0,1)
beats alternatives as a “standard” by a huge margin.
(of course I do not try to impute here that popularity should be the major factor in deciding what is best from the scientific perspective)
Also, from a purely practical point of view, different applications just have different needs for the endpoints, and it is much easier to discard the 0 from [0,1) than to add it to (0,1) preserving uniformity.
By my estimate, a CPU that does nothing but generate random numbers will come up with an exact zero every 1000 hours or so. (Obviously depends on the number of cores and the complexity of the algorithm that generates the underlying pseudorandom Int64
s.)
So not that rare if you’re doing Monte Carlo simulations.
Edit: Just Googled the frequency of memory errors, and it seems to be the same order of magnitude (once per 1000 hours). Far more common than I would have guessed…