Are rand(Float32) really uniform in [0,1]?

Hello, this is my first post.

I started tinkering with Julia and I wanted to understand more how random floating point numbers are generated. I will usually call rand(Float32) and be happy. I make the following statement:

If a floating point x is uniformely distributed in [0,1] then mantissa(x) (as a bitstring) is uniformely distributed in [0..2^m)

I’m pretty sure of the result because if it’s uniform in [0,1] it must be uniform in every interval [2^i, 2^(i+1)) and being uniform in such intervals means having a uniform mantissa.

I have produced a notebook (which I cannot upload for some reason) in which I test the distribution of the mantissa bits for different generation policy.

Why the designers of the language have chosen this particular generation style? Isn’t better to have (mostly) uniform mantissa bits? Btw, here’s the alternative generation scheme, copied from the standard library of C++ (there may be errors)

struct CppRNG <: Random.AbstractRNG
    rng::Random.AbstractRNG
    CppRNG(rng::Random.AbstractRNG) = new(rng)
end


function Random.rand(
    rng::CppRNG,
    ::Type{T}
) where { T <: Union{Float16, Float32, Float64} }
    U = Base.uinttype(T)

    x_int = rand(rng.rng, U)
    max_int = 2^(8*sizeof(U) + 1)

    converted = T(x_int)
    normalized = converted / T(max_int)

    @assert 0.0 <= normalized < 1.0
    @assert typeof(normalized) == T

    return normalized
end

I’m interested in the motivation for the choice. I understand that a possible change would break all the current programs so it’s out of scope

1 Like

Not quite. If x is uniformly distributed over all positive normalized floating point numbers (not over the real interval [0,1]), then the mantissa bits would be uniformly distributed.

When sampling over the real interval [0,1] the results are biased because there are intervals, such as [0.5,1] that are larger than other intervals, such as [0.25,0.05] and those larger intervals have mantissas all of 1. and the implicit leading 1 combined with mantissa bits creates a non-uniform distribution when viewed as raw bits.

8 Likes

Your observation is correct. Random floating point numbers in julia are generated as random fixed point numbers, then rounded to floating point. This is equivalent to sampling a random number in [1,2) and subtracting 1.

The most egregious violation is that the probability of hitting exactly zero with rand(Float32) should be on order nextfloat(1.0f0), i.e. never observed in human history, but in fact happens all the time.

A corollary is that you somewhat counter-intuitively cannot use random samples from (0,1] to sample from an unbounded distribution with known cdf, like e.g. the otherwise attractive 1/rand() algorithm for sampling from power-law: You might naively expect that this works, because you can evaluate 1/x with good precision on floating point, but you really are evaluating 1/((1+rand()) -1) which suffers from catastrophic cancellation / floating point precision loss.

This is a known problem, we had a giant thread about it. The standard computer science literature is pretty silent on the issue. Many real world existing codes are very wrong on this, e.g. the Float32 output of Pareto in rand_distr - Rust has catastrophically wrong tails.

One of the issues is that we cannot quickly evaluate rand(Float64) in a precise and branch-free manner. Sampling such a number has ~54 bit entropy, but entropy is really just the expected number of needed random bits, and each number has a 0.1% chance of needing more than 64 bit.

In the end, we did not go forward with fixing the issue in julia. Maybe we will at some future time.

But code that relies on “correct” floating point sampling is pretty problematic anyway, since to our knowledge almost no other language / framework generates “correct” floating point samples. Writing such code is a landmine for any persons who port it to a different language.

3 Likes

Not quite. As I remember, the bits-to-float transformation used in Base is slightly more sophisticated and preserves one more bit of entropy. It’s equivalent to sampling a random number in [0.5, 1) and then flipping a coin whether to subtract 0.5 or not.

I think CUDA.jl’s device RNG library may be using the [1, 2) - 1 variant.

Only the brave enter here: Output distribution of rand(Float32) and rand(Float64), thread 2

2 Likes

My thesis is that rand(Float32) should return the floating point representation of a real number uniformely distributed in [0,1]. By what I have written above, the mantissa (what comes after 1.xxxx) should be uniformely distributed in every exponent range and so it should be uniform (regardless of how represented are such exponent ranges).

With the current generation scheme when x is sampled, depending on the exponent range it doesn’t have m significant digits, with m being the size of the mantissa

How do I interpret the graphs properly? What does the 0-22 mean for the mantissa bits and why do the frequencies for each peak at 0.5 instead of collectively adding to 1?

Just checking, isn’t the maximum of an unsigned integer 2^(8*sizeof(U))-1? That’s about what typemax does anyway. If we’re going for [0, 1) as documented, then maybe it’s just 2^(8*sizeof(U)).

I think the expectation that sampling values uniformely from [0,1] implies that all Float32 values in that interval can be returned from rand is just not true. And once that is not guaranteed, there is no reason to expect that the mantissas are uniformely distributed.

The question is kind of: to what bin size is the output of rand uniformely distributed? I think a good part of the thread linked above discusses basically this question.

1 Like