Output distribution of rand(Float32) and rand(Float64), thread 2

mbauman · November 18, 2023, 6:57am

One thing that I’ve found as I’ve continued to noodle on this here and there is how easy it is to get things subtly wrong, especially when chaining together many iterations adding bits. That’s where it’s really nice to have algorithms that are optimizations of a simple and obviously correct (but slow) version.

I’ve also found it super useful to break things down into very small definitions that are easy to test. I’ve not finished this implementation yet, but I think this idea for “perfect precision” is looking promising — or at least should have useful components for rounding-down floating point bit twiddles.

Here’s the slow-but-obviously-correct premise:

# Rounding down an unsigned fixed-point [0,1) representation to float
fixed2float(::Type{F}, r::Unsigned) where {F} = F(big(r)/2^nbits(r), RoundDown)
# Divide a positive floating point value by 2^N (N > 0), rounding down.
÷₂ꜛ(value::F, N) where {F} = F(value * big(2.0^-N), RoundDown)
# Add a `smaller` floating point value into `bigger` (where smaller is less than the smallest nonzero value of bigger), rounding down
+₎(bigger::F, smaller::F) where {F} = F(big(bigger) + big(smaller), RoundDown)
# Get the epsilon of a value
log2eps(value) = round(Int, log2(eps(value)), RoundDown)

# Iteratively getting to "perfect precision"
function randf(rng, ::Type{F}, ::Type{U}) where {F <: Base.IEEEFloat, U <: Unsigned}
    value = fixed2float(F, rand(rng, U))
    nbits = 8*sizeof(U)
    while log2eps(value) <= -nbits
        value = value +₎ (fixed2float(F, rand(rng, U)) ÷₂ꜛ nbits)
        nbits += 8*sizeof(U)
    end
    return value
end

The huge benefit is that the above is the test suite. And it’s all super-generic; you can test iteratively getting every single Float16 value, three bytes at a time. Anyhow, my current optimizations of the above are looking promising. Scalar performance is the easy part — it’s so rare to hit those branches that it’s basically an even match. And I’ve just done the rudiments of the SIMD implementation without much testing there — but it is doing the above iterative thing and I’m still just barely within that 2x slowdown mark for Float32 on my M1.

julia> @btime Random.rand(Float32);
  2.958 ns (0 allocations: 0 bytes)

julia> @btime ShawshankRandemption.rand(Float32);
  3.084 ns (0 allocations: 0 bytes)

julia> A = Array{Float32}(undef, 2^20);

julia> @btime Random.rand!($A);
  325.541 μs (0 allocations: 0 bytes)

julia> @btime ShawshankRandemption.rand!($A);
  645.125 μs (0 allocations: 0 bytes)

Topic		Replies	Views
Are rand(Float32) really uniform in [0,1)? General Usage question	29	1071	November 5, 2025
Output distribution of rand() Internals & Design	7	2289	October 19, 2023
Is it really possible for rand to throw a zero? Statistics question	61	4721	September 18, 2019
How to create a random Uniform Distribution between (but excluding) 0 and 10? New to Julia	20	27769	March 17, 2019
Random number in (0,1] General Usage	16	15105	October 8, 2018

Output distribution of rand(Float32) and rand(Float64), thread 2

Related topics