If there was no consideration of performance, and this was pure math, I would do the following:
Sample a real number in [0,1]
, uniformly according to standard measure. Then round it down to the next float.
In other words: P(rand() == x) = |[x, nextfloat(x)]|
for x: x == round(x)
, and P(rand() < p) = p
for p: p == round(p)
.
We now have a distribution that includes zero and subnormals. However, for Float64, Float32 and BFloat16, the probability of hitting exact zero or a subnormal number is smaller than ldexp(1.0, -120)
. This probability means ānever in our finite universeā.
More precisely, it means āonly if the random generator is badly seededā. On the other hand, exact zeros and subnormals are liable to trigger bugs that can mess up worse than bad statistics ā e.g. security bugs.
So my principled distribution-with-safety-belt has P(rand() < p) = p
for p: p == round(p) and p > minnormal
, where minnormal is the smallest positive non-subnormal float. Then we clamp P(rand() < minnormal) = 0
. This safety-belt is statistically invisible in our finite universe, but saves the bacon if somebody is bad at seeding the RNG.
======================
Ok, we now need to do compromises with performance. E.g. your @StefanKarpinski proposed randf
.
I am saying that we should retain the clamping close to zero from the ideal-with-safety-belt distribution. Clamp at something small like e.g. ldexp(1.0, -60)
. This clamping is no more or less principled than the choice how to distribute our subset of floats where we have the desired equality P(rand() < p) = p
======================
All these clamping arguments do not apply to Float16. This is because nextfloat(Float16(0)) == Float16(6.0e-8)
is a pretty large number with a probability that is significant! Clamping that introduces real bias that can mess up real statistics!
Hence, I have no opinion on Float16.
======================
PS. I think rounding Bernoulli(1.0e-100)
to Bernoulli(1.0e-20)
is a much worse mistake to make than e.g. rounding Bernoulli(1.0e-18)
to Bernoulli(0.0)
.