Output distribution of rand(Float32) and rand(Float64), thread 2

I’m not advocating bits2float(u::UInt32) style of algorithms any longer. The reason is that random bits are so damn cheap since we moved away from mersenneTwiser that we can just take 64 bits for a Float32.

They are literally, not just figuratively, free since the random generator does not even bother to track finer grained number of bits than batches of 64!

If we insist on a bits2float(u::UInt32) then we’re screwed somewhere and need to do hacky things.

@mbauman I think your “count how many uints are mapped to each result” - style of arguments is very close to the following information content argument.

So in that context: information content of x is -log2(probability(x)). The expected / mean information content of the distribution is the Shannon entropy.

The desired ideal distribution has entropy 25 bits (its more like 24.9... bits – 23 from the mantissa, plus almost 2 from the truncated geometric exponent).

Nice, looks good when we start with 32 bits of random?

Nope! Half the numbers are in [0.5) and have 24 bits of information each. 1/4 of the numbers are in [0.25, 0.5) and have 25 bits of information each. 1/8 of the numbers have 26 bits each… and a significant chunk of numbers have more than 32 bits of information.

This is bad! Unless we can stash random bits for later use, it’s not helpful that we often only use 24 bits, or that we only need 25 bits on average – the rest will be discarded. And there is no stash to help us when the rainy day comes and we need more than our allotment of 32 bits.

So entropy (expected / mean information) is not really the relevant quantity.

On the other hand, maximal information content is not the relevant quantity either.

Consider the task of generating a geometric distributed BigInt. P(0) = 1/2, P(1) = 1/4, P(2) = 1/8, etc. How many bits do you need?

Shannon entropy is 2 bits, maximal information content is infinite.

So the relevant quantity is a quantile. Given N, let P_N be the probability that a random output has more than N bit information content. Now we have the language to say: We need enough random bits such that P_N is vanishingly small.

If we come into the bad (unlikely, prop < P_N) situation that we don’t have enough bits to select an output, then we can truncate, do hacky stuff, or halt and catch fire.

What should P_N, i.e. the threshold for “vanishingly rare” be? Depending on social contract that could be:

  1. Once in a blue moon (a core-year). Nothing bad happens in case of failure, so we can afford rare failures. That would be log2(3e16) where 3e16 is the number of nanoseconds in a year, i.e. 55 bit.
  2. So rarely that you can seriously consider rather disbelieving your CPU than believe that you got so unlucky. Afaiu 1 error per 1000 core-years is expected (limited reliability of silicon). That would be ~ 64 bit.
  3. So rare that you definitely rather disbelieve the CPU. Or you rather disbelieve your lying eyes (you could be having a stroke right this second!). Or your software has a bug. It must be a bug. The CIA has spiked your coffee with LSD. This is a nightmare. Wake up. Wake up now! (that’s 128 bit likeliness)
1 Like