A recent hackernews post lead me to necro https://discourse.julialang.org/t/output-distribution-of-rand/12192. If we want to reopen that can, I think a new thread is in order.
Summary
rand()
, and much more acutely rand(Float32)
generate a uniformly distributed random floating point number in the unit interval. In actuality, these functions generate a uniformly distributed fixed-point number in the unit interval, and convert it to floating point. This means that the output distribution close to zero is quite counter-intuitive, and very much not what one would naively expect from “random floating point number in the unit interval”.
This is a footgun. Do we want to do something about it? What should we do about it?
Fixed point vs floating point numbers
Fixed point numbers have the property that adjacent numbers have constant distance. Floating point numbers have the property that adjacent numbers have roughly constant quotients. In the unit interval, this means that representable floating point numbers are much denser close to zero. For floating point numbers, one typically measures errors in quotients, not differences (relative error, in order to be scale-free; if one cared about absolute errors only, there is conceptually no point in using floating point numbers in the first place!).
If one adds a small and a large floating point number, then the excess precision of the small number is lost. This is not a problem in itself – it creates neither large relative nor absolute errors.
If one subtracts two numbers of the same rough magnitude, then this is called “cancellation”, and it can be highly problematic: The result has still small absolute error, but can have very large relative error. If we happened to use floating point numbers for any other reason than convenience or CPU support, then this is called catastrophic cancellation.
How julia generates random floats
Julia generates random floats by first generating a random number r = rand12()
in the interval [1, 2)
. In this interval, floating point numbers are the same as fixed point numbers (literally!). This is therefore very easy to do and fast – just fill the significand/mantissa with a random bitstring.
Next we subtract 1
,
rand() = rand12() - 1
At this point we should be very afraid: rand12() - 1
contains a catastrophic cancellation if our random number in [1,2)
happened to be close to 1. In other words, we see that our random numbers are very bad for results close to zero.
When does it matter
This does not matter at all if we add our random numbers to something nonzero of constant size.
It does matter if we e.g. compute 1/rand()
. Or to give a less contrieved example: Suppose we want to sample from a random distribution that gives numbers in [1, Inf)
; say from a power law, with probability density ~ 1/x^a
with parameter a>1. The way this is typically done is by considering the inverse cumulative distribution function, F(u) = x
such that probability(sample() > x) = u
, where u \in [0,1]
. For the power law, this means u ~ 1 / x^(a - 1)
, or F(u) ~ 1 / u ^ (1/(a-1)) = pow(u, -1 / abs(a-1))
. In other words, we will have code like e.g. with a = 2:
sample() = constant / rand()
This is code that looks perfectly fine from a precision standpoint. We have a singularity at zero; but this is not an issue for floating point numbers. If we decided to instead write the real-arithmetic equivalent
sampleBad() = constant / (1 - rand())
i.e. decided to place the singularity of our mapping at 1
instead of 0
, then alarm bells would start ringing – this is catastrophic cancellation. The statistics of extreme events will be completely wrong. But extreme events are one of the most important things when looking at fat-tailed distributions like power laws!
And, with the current julia random number generation, this is what we do:
sample() = constant / (rand12() - 1)
In other words, the output of rand()
is by construction unusable for applications that care about precision close to zero.
An extreme example: Exact zeros in Float32
Consider: Take a uniform random real number u \in [0,1]
, and ask: What is the probability that convert(Float32, u) == 0.0f0
?
Well, a good guess would be nextfloat(0.0f0) == 1.0f-45
, but depending on rounding mode intricacies something like roughly 1.0f-37
is also a valid guess (smallest non-subnormal positive float).
It does not matter. The age of the universe is of rough order 1.0f27
nanoseconds, i.e. the correct answer should be “never observed in all past or future computations done by mankind”.
Will you observe convert(Float32, rand(Float64)) === 0.0f0
? probably not today. Will you observe rand(Float32) === 0.0f0
?
yeah, all the time:
julia> function func()
N = 0
while true
N += 1
rand(Float32) == 0 && return N
end
end
func (generic function with 1 method)
julia> func()
22098446
julia> func()
3274928
julia> func()
19735810
It is egregiously misleading to say that rand(Float32)
generates a uniform distributed random floating point number from the unit interval.
What to do about it: Nothing
We can chose to ignore the problem. We can say: This is well-known among experts and they know how to work around it; this is how everyone does it; almost nobody is complaining.
What to do about it: Document it
One way to approach this is to document this footgun and provide an alternative function that does the right thing
An example is log
, which links to log1p
for avoiding cancellation when evaluating logs of numbers close to 1. So we could have a rand0p
that is linked in the docs, and it is up to users to know / read the docs to use the right random function, on penalty of incorrect statistics of extreme events close to zero.
What to do about it: Fix it for real
The complementary approach to documenting this would be to provide an alternative fastrand
function that produces the current counter-intuitive distribution, and change rand
to produce the Right Thing
Thus we would replace a correctness footgun with a performance footgun: it is up to users to know / read the docs to use the right random function, on penalty of mysterious performance losses.
This has large compat issues (is there code that relies on the counter-intuitive cancellation?) and performance issues, depending on how we implement that: how fast can we generate the Right Thing? Can we SIMD it? With which instruction sets (i.e. does it need avx512 or is there a avx2 / neon way of doing this right)? On GPU? Will this lead to problems when people implement/port algorithms from papers that assume random numbers to be fixed-point-distributed instead of floating-point-distributed?
This is a can of worm that requires lots of discussion.
What to do about it: Document it, partially fix it
We can split it in the middle: I.e. do something reasonably fast that generates a distribution that is less egregiously misleading, and provide document and link two alternatives fastrand
and rand0p
. An example would be: Exclude zero from rand(Float64)
output but otherwise keep it, and implement rand(Float32) = convert(Float32, rand(Float64))
.
What do other languages do?
Java generates the same flawed distribution as julia. Numpy generates the same flawed Float64 distribution as julia, cf https://github.com/numpy/numpy/blob/85d956917903624289c451bd0fcfa4743ec86988/numpy/random/_common.pxd#L67 , I think they do something better for Float32, though?
Shockingly zig does the right thing (or something close to it), cf Zig. In fact it was this link on the related hackernews discussion that renewed my hope that something can be done at all!