Output distribution of rand(Float32) and rand(Float64), thread 2

If there was no consideration of performance, and this was pure math, I would do the following:

Sample a real number in [0,1], uniformly according to standard measure. Then round it down to the next float.

In other words: P(rand() == x) = |[x, nextfloat(x)]| for x: x == round(x), and P(rand() < p) = p for p: p == round(p).

We now have a distribution that includes zero and subnormals. However, for Float64, Float32 and BFloat16, the probability of hitting exact zero or a subnormal number is smaller than ldexp(1.0, -120). This probability means ā€œnever in our finite universeā€.

More precisely, it means ā€œonly if the random generator is badly seededā€. On the other hand, exact zeros and subnormals are liable to trigger bugs that can mess up worse than bad statistics ā€“ e.g. security bugs.

So my principled distribution-with-safety-belt has P(rand() < p) = p for p: p == round(p) and p > minnormal, where minnormal is the smallest positive non-subnormal float. Then we clamp P(rand() < minnormal) = 0. This safety-belt is statistically invisible in our finite universe, but saves the bacon if somebody is bad at seeding the RNG.

======================
Ok, we now need to do compromises with performance. E.g. your @StefanKarpinski proposed randf.

I am saying that we should retain the clamping close to zero from the ideal-with-safety-belt distribution. Clamp at something small like e.g. ldexp(1.0, -60). This clamping is no more or less principled than the choice how to distribute our subset of floats where we have the desired equality P(rand() < p) = p

======================

All these clamping arguments do not apply to Float16. This is because nextfloat(Float16(0)) == Float16(6.0e-8) is a pretty large number with a probability that is significant! Clamping that introduces real bias that can mess up real statistics!

Hence, I have no opinion on Float16.

======================

PS. I think rounding Bernoulli(1.0e-100) to Bernoulli(1.0e-20) is a much worse mistake to make than e.g. rounding Bernoulli(1.0e-18) to Bernoulli(0.0).

1 Like

I have seen you say this multiple times, but likeā€¦ how?? isnā€™t that the whole point of this thread, to come up with algorithms to sample uniformly from unit interval?

1 Like

Heā€™s giving the thing weā€™re supposed to model (he says ā€œand this was pure mathā€) and then describes the suggested compromisesā€¦

I think this is the key. Yes, excluding 0 induces a bias, but the bias is ā€œessentially unobservableā€ and Iā€™m with @foobar_lv2 and also with him about Float16. You really canā€™t expect much from Float16 when it comes to accurately modeling low probability events.

This is a fundamental disagreement between our world views. Floats are not approximate. They exactly identify a real value.

The current rand() < p is exactly correct for all ps that are multiples of eps(0.5).

3 Likes

That is neither uniform under the P(rand() < p) = p criterion since P(rand() < min(S)) = 0 < min(S) nor under the P(rand() ā‰¤ p) = p criterion since P(rand() < max(S)) = 1 > max(S). More generally, if _rand() is uniform in $[0, 1)& then your rand() will have probability too small at the smallest value in the support and too big everywhere else. This means that doing rand() < p to get an event with probability p is always wrong and not fixable in any obvious way.

1 Like

Except that no computer RNGs have that level of accuracy/uniformity in bit generation or even close, and as @foobar_lv2 says, if you ask for a Bernoulli(1f-50) youā€™ll get a Bernoulli(0.0)

More to the point, whatā€™s the observable consequences of cutting off an interval between [0.0,nextfloat(0.0)] or [0.0f0,nextfloat(0.0f0)]?

Those are of size:

julia> nextfloat(0.0)
5.0e-324

julia> nextfloat(0.0f0)
1.0f-45

The first one is unobservable in our finite universe as @foobar_lv2 says, the second one might be observableā€¦ lemme think, a cluster of 1 million cores each operating at 1 billion operations per second would take on average ā€¦

julia> 1.0/nextfloat(0.0f0)/(1e9*1e6*3600*24*365)
2.2628863722506974e22

So 10^22 years to see the first consequence.

Again, unobservable in our universe.

However, computers arenā€™t perfect, bugs in RNG implementations or seeding could induce bias towards 0.0 dramatically more than they should have, so weā€™re talking about protecting against a known special case which will only be hit if the RNG is already ā€œbrokenā€ in some sense. Which is the more important source of bias? the risk of broken software or the risk that youā€™ll miss out on 1 event in 10^22 years of massive supercomputing?

This thread has multiple points. We at first need to reach consensus what ā€œsample uniformly from unit intervalā€ means, mathematically speaking, in the context of floating point numbers. I had hoped that this consensus was reached by now?

Hence I answered with a mathematical definition, not with an algorithm.

Then we need to come up with fast algorithms to do that, approximately (there will be trade-offs between perf and precision).

And we may want appropriate safety rails. An important point I tried to make is that users are incapable of dealing with precise zeros in the rand output. I brought empirical evidence to the table.

This strongly suggests that the following property of the ā€œrealā€ distribution should not be traded away for performance: Exact zeros donā€™t happen with the computational resources available to mankind (unless the RNG is ill-seeded).

I was a little hyperbolic on that ā€“ ldexp(1.0, -1000) is cosmologically impossible, but ldexp(1.0, -120) is merely humanly impossible.

Wait, so the stance is that we should exclude zero for some float types but not others? Thatā€™s so much worse.

4 Likes

IMHO we shouldnā€™t by default provide rand for Float16ā€™s but instead provide a function that errors with a message about please define rand(Float16) yourself, according to whichever types of biases you prefer and hereā€™s a palette of several implementations you could choose from.

Are you trolling or serious?

1 Like

No Iā€™m serious, the tradeoffs should be explicit and only a small number of people should be using Float16 I think? Or maybe I am not aware of people using Float16 like crazy in some current applications?

Or let me put it this way, the probability to get Inf in float16 sampling of Exponentials isā€¦ 6e-8 so ā€œ10 times a secondā€

You and @foobar_lv2 are continuing to conflate the number of available bit patterns with the values themselves. Ironically, those two things are only the same in the [1, 2) interval.

Once you have floating point numbers that have varying epsilons, you must be more considered in what it means to draw one such number.

3 Likes

No, the size of the region that rounds to 0 is | [0,6e-8] | = 6e-8 (or within a factor of 2 of that, depending on rounding mode I guess)

I canā€™t seem to reproduce that:

julia> using Dates, Random

julia> try
       println(now())
       while true
           isinf(randexp(Float16)) && println("gotcha")
       end
       finally
       println(now())
       end
2023-10-28T23:41:26.450
^C2023-10-28T23:41:42.422
ERROR: InterruptException:
Stacktrace:
 [1] _randexp(rng::TaskLocalRNG, ri::UInt64)
   @ Random ~/julia/usr/share/julia/stdlib/v1.11/Random/src/normal.jl:129
 [2] randexp
   @ ~/julia/usr/share/julia/stdlib/v1.11/Random/src/normal.jl:127 [inlined]
 [3] randexp
   @ ~/julia/usr/share/julia/stdlib/v1.11/Random/src/normal.jl:198 [inlined]
 [4] randexp
   @ ~/julia/usr/share/julia/stdlib/v1.11/Random/src/normal.jl:199 [inlined]
 [5] top-level scope
   @ ./REPL[7]:4

Interesting. On the other hand if you ask it to print gotcha when rand(Float16) gives Float16(0.0) you get Gotcha 10 times a second. So evidently thereā€™s a difference in the implementation for Exponential?

Are we posting in the same thread? Am I that bad at communicating?

julia> function foo(p, N)
       smaller = 0
       smallerEqual = 0
       for i=1:N
       r = rand(typeof(p))
       r <= p && (smallerEqual += 1)
       r < p && (smaller += 1)
       end
       @show N, p, smaller, smallerEqual, smaller/N, smallerEqual/N
       end
foo (generic function with 1 method)

julia> foo(ldexp(eps(0.5f0), -30), 1<<32)
(N, p, smaller, smallerEqual, smaller / N, smallerEqual / N) = (4294967296, 5.551115f-17, 276, 276, 6.426125764846802e-8, 6.426125764846802e-8)
(4294967296, 5.551115f-17, 276, 276, 6.426125764846802e-8, 6.426125764846802e-8)

This is what Iā€™m complaining about. This result is wrong by 9 orders of magnitude.

1 Like

This literally made me laugh out loud.

Yes, and as I pointed out in message #37 of this thread, this is expected, because the current generation scheme does not allow for the precision youā€™re asking for.

Could it be improved? Maybe, if performance of such a scheme was good AND followed the currently guaranteed distribution of possibly generating 0.0. So far, youā€™ve either generated a different distribution, generated a biased one, or had much worse performance.

2 Likes

My point above is that generating a biased distribution when the bias is at the level of ā€œglobal scale supercomputing canā€™t observe the bias in billions of times longer than the duration of the universeā€ is the right thing to do because errors in computer code are much much more of a problem than biases of this size. From that perspective, excluding 0.0 is the right thing to do for Float64 and Float32 because itā€™s unobservable bias. (and note, itā€™s what R does when generating rexp)

Float16 is problematic as a model for probability.