Output distribution of rand(Float32) and rand(Float64), thread 2

foobar_lv2 · October 28, 2023, 8:57pm

If there was no consideration of performance, and this was pure math, I would do the following:

Sample a real number in [0,1], uniformly according to standard measure. Then round it down to the next float.

In other words: P(rand() == x) = |[x, nextfloat(x)]| for x: x == round(x), and P(rand() < p) = p for p: p == round(p).

We now have a distribution that includes zero and subnormals. However, for Float64, Float32 and BFloat16, the probability of hitting exact zero or a subnormal number is smaller than ldexp(1.0, -120). This probability means “never in our finite universe”.

More precisely, it means “only if the random generator is badly seeded”. On the other hand, exact zeros and subnormals are liable to trigger bugs that can mess up worse than bad statistics – e.g. security bugs.

So my principled distribution-with-safety-belt has P(rand() < p) = p for p: p == round(p) and p > minnormal, where minnormal is the smallest positive non-subnormal float. Then we clamp P(rand() < minnormal) = 0. This safety-belt is statistically invisible in our finite universe, but saves the bacon if somebody is bad at seeding the RNG.

======================
Ok, we now need to do compromises with performance. E.g. your @StefanKarpinski proposed randf.

I am saying that we should retain the clamping close to zero from the ideal-with-safety-belt distribution. Clamp at something small like e.g. ldexp(1.0, -60). This clamping is no more or less principled than the choice how to distribute our subset of floats where we have the desired equality P(rand() < p) = p

======================

All these clamping arguments do not apply to Float16. This is because nextfloat(Float16(0)) == Float16(6.0e-8) is a pretty large number with a probability that is significant! Clamping that introduces real bias that can mess up real statistics!

Hence, I have no opinion on Float16.

======================

PS. I think rounding Bernoulli(1.0e-100) to Bernoulli(1.0e-20) is a much worse mistake to make than e.g. rounding Bernoulli(1.0e-18) to Bernoulli(0.0).

adienes · October 28, 2023, 8:59pm

I have seen you say this multiple times, but like… how?? isn’t that the whole point of this thread, to come up with algorithms to sample uniformly from unit interval?

dlakelan · October 28, 2023, 9:03pm

He’s giving the thing we’re supposed to model (he says “and this was pure math”) and then describes the suggested compromises…

I think this is the key. Yes, excluding 0 induces a bias, but the bias is “essentially unobservable” and I’m with @foobar_lv2 and also with him about Float16. You really can’t expect much from Float16 when it comes to accurately modeling low probability events.

mbauman · October 28, 2023, 9:03pm

This is a fundamental disagreement between our world views. Floats are not approximate. They exactly identify a real value.

The current rand() < p is exactly correct for all ps that are multiples of eps(0.5).

StefanKarpinski · October 28, 2023, 9:03pm

That is neither uniform under the P(rand() < p) = p criterion since P(rand() < min(S)) = 0 < min(S) nor under the P(rand() ≤ p) = p criterion since P(rand() < max(S)) = 1 > max(S). More generally, if _rand() is uniform in $[0, 1)& then your rand() will have probability too small at the smallest value in the support and too big everywhere else. This means that doing rand() < p to get an event with probability p is always wrong and not fixable in any obvious way.

dlakelan · October 28, 2023, 9:05pm

Except that no computer RNGs have that level of accuracy/uniformity in bit generation or even close, and as @foobar_lv2 says, if you ask for a Bernoulli(1f-50) you’ll get a Bernoulli(0.0)

dlakelan · October 28, 2023, 9:17pm

More to the point, what’s the observable consequences of cutting off an interval between [0.0,nextfloat(0.0)] or [0.0f0,nextfloat(0.0f0)]?

Those are of size:

julia> nextfloat(0.0)
5.0e-324

julia> nextfloat(0.0f0)
1.0f-45

The first one is unobservable in our finite universe as @foobar_lv2 says, the second one might be observable… lemme think, a cluster of 1 million cores each operating at 1 billion operations per second would take on average …

julia> 1.0/nextfloat(0.0f0)/(1e9*1e6*3600*24*365)
2.2628863722506974e22

So 10^22 years to see the first consequence.

Again, unobservable in our universe.

However, computers aren’t perfect, bugs in RNG implementations or seeding could induce bias towards 0.0 dramatically more than they should have, so we’re talking about protecting against a known special case which will only be hit if the RNG is already “broken” in some sense. Which is the more important source of bias? the risk of broken software or the risk that you’ll miss out on 1 event in 10^22 years of massive supercomputing?

foobar_lv2 · October 28, 2023, 9:20pm

This thread has multiple points. We at first need to reach consensus what “sample uniformly from unit interval” means, mathematically speaking, in the context of floating point numbers. I had hoped that this consensus was reached by now?

Hence I answered with a mathematical definition, not with an algorithm.

Then we need to come up with fast algorithms to do that, approximately (there will be trade-offs between perf and precision).

And we may want appropriate safety rails. An important point I tried to make is that users are incapable of dealing with precise zeros in the rand output. I brought empirical evidence to the table.

This strongly suggests that the following property of the “real” distribution should not be traded away for performance: Exact zeros don’t happen with the computational resources available to mankind (unless the RNG is ill-seeded).

I was a little hyperbolic on that – ldexp(1.0, -1000) is cosmologically impossible, but ldexp(1.0, -120) is merely humanly impossible.

StefanKarpinski · October 28, 2023, 9:21pm

Wait, so the stance is that we should exclude zero for some float types but not others? That’s so much worse.

dlakelan · October 28, 2023, 9:23pm

IMHO we shouldn’t by default provide rand for Float16’s but instead provide a function that errors with a message about please define rand(Float16) yourself, according to whichever types of biases you prefer and here’s a palette of several implementations you could choose from.

StefanKarpinski · October 28, 2023, 9:28pm

Are you trolling or serious?

dlakelan · October 28, 2023, 9:28pm

No I’m serious, the tradeoffs should be explicit and only a small number of people should be using Float16 I think? Or maybe I am not aware of people using Float16 like crazy in some current applications?

Or let me put it this way, the probability to get Inf in float16 sampling of Exponentials is… 6e-8 so “10 times a second”

mbauman · October 28, 2023, 9:37pm

You and @foobar_lv2 are continuing to conflate the number of available bit patterns with the values themselves. Ironically, those two things are only the same in the [1, 2) interval.

Once you have floating point numbers that have varying epsilons, you must be more considered in what it means to draw one such number.

dlakelan · October 28, 2023, 9:41pm

No, the size of the region that rounds to 0 is | [0,6e-8] | = 6e-8 (or within a factor of 2 of that, depending on rounding mode I guess)

Sukera · October 28, 2023, 9:42pm

I can’t seem to reproduce that:

julia> using Dates, Random

julia> try
       println(now())
       while true
           isinf(randexp(Float16)) && println("gotcha")
       end
       finally
       println(now())
       end
2023-10-28T23:41:26.450
^C2023-10-28T23:41:42.422
ERROR: InterruptException:
Stacktrace:
 [1] _randexp(rng::TaskLocalRNG, ri::UInt64)
   @ Random ~/julia/usr/share/julia/stdlib/v1.11/Random/src/normal.jl:129
 [2] randexp
   @ ~/julia/usr/share/julia/stdlib/v1.11/Random/src/normal.jl:127 [inlined]
 [3] randexp
   @ ~/julia/usr/share/julia/stdlib/v1.11/Random/src/normal.jl:198 [inlined]
 [4] randexp
   @ ~/julia/usr/share/julia/stdlib/v1.11/Random/src/normal.jl:199 [inlined]
 [5] top-level scope
   @ ./REPL[7]:4

dlakelan · October 28, 2023, 9:45pm

Interesting. On the other hand if you ask it to print gotcha when rand(Float16) gives Float16(0.0) you get Gotcha 10 times a second. So evidently there’s a difference in the implementation for Exponential?

foobar_lv2 · October 28, 2023, 9:46pm

Are we posting in the same thread? Am I that bad at communicating?

julia> function foo(p, N)
       smaller = 0
       smallerEqual = 0
       for i=1:N
       r = rand(typeof(p))
       r <= p && (smallerEqual += 1)
       r < p && (smaller += 1)
       end
       @show N, p, smaller, smallerEqual, smaller/N, smallerEqual/N
       end
foo (generic function with 1 method)

julia> foo(ldexp(eps(0.5f0), -30), 1<<32)
(N, p, smaller, smallerEqual, smaller / N, smallerEqual / N) = (4294967296, 5.551115f-17, 276, 276, 6.426125764846802e-8, 6.426125764846802e-8)
(4294967296, 5.551115f-17, 276, 276, 6.426125764846802e-8, 6.426125764846802e-8)

This is what I’m complaining about. This result is wrong by 9 orders of magnitude.

dlakelan · October 28, 2023, 9:50pm

This literally made me laugh out loud.

Sukera · October 28, 2023, 9:53pm

Yes, and as I pointed out in message #37 of this thread, this is expected, because the current generation scheme does not allow for the precision you’re asking for.

Could it be improved? Maybe, if performance of such a scheme was good AND followed the currently guaranteed distribution of possibly generating 0.0. So far, you’ve either generated a different distribution, generated a biased one, or had much worse performance.

dlakelan · October 28, 2023, 9:56pm

My point above is that generating a biased distribution when the bias is at the level of “global scale supercomputing can’t observe the bias in billions of times longer than the duration of the universe” is the right thing to do because errors in computer code are much much more of a problem than biases of this size. From that perspective, excluding 0.0 is the right thing to do for Float64 and Float32 because it’s unobservable bias. (and note, it’s what R does when generating rexp)

Float16 is problematic as a model for probability.

Topic		Replies	Views
Output distribution of rand() Internals & Design	7	2225	October 19, 2023
Frédéric Goualard found an inexactness in our rand() Numerics random	19	1036	September 8, 2022
Performance of generating uniform Float32 numbers General Usage question	4	272	November 6, 2023
Random number in (0,1] General Usage	16	14961	October 8, 2018
Generating Float32 uniform random numbers with Distributions.jl General Usage distributions , random	2	827	November 6, 2023

Output distribution of rand(Float32) and rand(Float64), thread 2

Related topics