Are rand(Float32) really uniform in [0,1)?

My current go-to way to sample a uniform distribution is with randneg11() = fma(rand(Int64), 2.0^-63, 2.0^-64) ported from Random123. Instead of going from [0, 1), it goes [-1, 1], which is often more useful. The mean is exactly zero, but the output is never exactly zero, so no divide-by-zero errors.

In general, RNG’s are good at generating uniform Int64 or UInt64. How or if you want to convert to floating point is always going to be kind of problem-specific.

1 Like

I’d like to offer two slight modifications:

  1. We don’t need the probability to be exactly p. It suffices if the probability is statistically indistinguishable from p up to some generous amount of samples, e.g. 2^120 samples (2^120 samples far exceeds the amount of computation that is accessible to mankind, and far exceeds the reliability of hardware).
  2. The big potential improvement that was discussed was whether we want the stronger property that rand(Float32) < p has probability indistinguishable from p for all p with p == Float32(p). As mbaumann noted, this is the same thing in (0.5, 1); but it is a very different thing in (0, 2^{-30}).

Some applications need the stronger property (2). A very practical example is when you want to sample from pareto/power-law distributions via 1/rand(); or, more generally, apply a transformation to the sample that has a singularity at 0.

If you apply a transformation to a floating point number with singularity different from zero, then you deserve what you get. However, floating point numbers are designed to handle singularities near zero or near infinity – thats the point of floating the point, “relative error” – so one might naively expect that from random samples.

1 Like

Can we instead do P(rand(Float32) = x) = \texttt{nextfloat}(x) - x for all representable x \in [0,1)?

This is precisely the distinction. Modern floating point rand() implementations in practically all languages do not sample from a continuous Uniform distribution over [0, 1). You really shouldn’t transform them as though they did.

5 Likes

That’s what the infinite-precision algorithm linked to by an above post achieves.

To save some notation: nextfloat(x)-x == eps(x) (for x >= 0 except floatmax).

1 Like

It’s worth pointing out that the denominator-corrected CppRNG doesn’t accomplish this distribution despite the biased integer-to-float map and more uniform usage of mantissa bits because the nominal and minimum resolution is 2^-32, much larger than the smallest normal floatmin(Float32) == 1.1754944f-38 and the subnormals. The formula does neatly express that some floating point values x are generated more often than others and implies P(rand(Float32) \leq x) = \texttt{nextfloat}(x) , a rounding down of a standard uniform distribution.

But it is approximating one, right? If so, are there really no continuous transformations we can adapt? Or is there just some hopefully intuitive limit to the math? Does this change if we use the biased map instead?

Yes, exactly — it’s a rounded down approximation to a fixed finite precision, which is not the same as the precision of the float it returns (and I do agree, that’s unfortunate and might be unexpected, but it matches everyone else).

But here’s the powerful thing about having this particular definition. It means that, even though there are only a handful discrete values we may see from log(1-rand(Float32)) that are less than -15, I’m pretty sure I still know how to rigorously interpret them! And I can in fact build a CDF atop that. It’s just not a random variable out of the log-transformed distribution.

1 Like

Yes. We are in full agreement.

I just think that this is very unfortunate, because it would be natural to expect to be able to do this. And people actually make this mistake, like the linked rust package.

I think we’re missing “sample floating-point-uniform from (0,1)” as a feature/API, linked from the rand() docstring.

The issue is that most of the software world – including julia – aim for something like |P(rand() < p) - p| < \epsilon, i.e. absolute tolerances, as appropriate for fixed point numbers.

On the other hand, floating point is specifically designed to often allow estimates of the style |P(rand() < p) - p| < \epsilon \max(P(rand()<p), p), i.e. relative error estimates and tolerances.

Relative tolerances are ingrained into the intuitions of many people and scientific fields. People who are not aware of the history of this API are liable to simply assume bounded relative error of probabilities for floating point sampling. Otherwise it should be called “fixed point sampling”.

And no, it is not possible to fix that afterwards, by doing transforms. Lost precision cannot be recovered. Nor is it possible to solve this via a better UInt32->Float32 map: Probabilities of UInt32 are quantized to integer multiples of 2^-32, but small floats need much lower probabilities. Instead, one needs to either introduce branches, or needs to overprovision random bits (e.g. consume 128 random bits to generate a single 32bit float, without branches).

However, this is a pretty niche requirement that can very well be a separate API.

3 Likes

I think that’s actually a great solution to this whole discussion! We could just take one of the proposals from the previous thread and make a function Random.randfloat that produces a random float following the uniform distribution on (0,1)(or including 1 idc). Together with a warning+link from rand’s docstring, that seems to be a quite satisfactory solution to me :slight_smile:

2 Likes

Or, alternatively, a package could introduce a wrapper type, eg Uniformly and then define rand(rng, ::Type{Uniformly{T}}) where T <: AbstractFloat.

I think that the best solution would be documenting what users can expect from rand(rng, ::Type{<:AbstractFloat}). Pseudorandom number generation is not intuitive, and is also subject to practical constraints (for most people, a fast implementation is preferable over some “ideal” one, even if we could agree on the latter).

Probably saying that

  1. rand(T) < p has a probability p \pm \delta(T), where \delta(\text{Float32}) = 2^{-24}, and similarly for other float types, in a table,
  2. \delta(T) for a specific type will not increase in future Julia versions,

would be a good starting point.

This

  1. documents the current behavior,
  2. yet leaves room for further optimization/changes,
  3. clarifies that if you do transformations that are sensitive to smaller values, you may be introducing a mistake (eg a naive log2(rand(Float32))).
3 Likes