@StefanKarpinski I agree, and it’s more or less the argument I thought that was always being made, so I like that we’re converging on some common understanding.
Here’s some data on timing for my slightly modified code above:
julia> function rand32()
                i = rand(UInt64)
                a = i & 0xffffffff
                b =  i >> 32
                if a == 0x0
                  if b == 0 ;  b= 1 ; end
                  c = ldexp(Float32(b),-64)
                 else 
                   c = ldexp(Float32(a),-32)
                end
                return c
             end
rand32 (generic function with 1 method)
julia> @btime(rand32())
  4.308 ns (0 allocations: 0 bytes)
0.4268289f0
julia> @btime(rand(UInt64))
  2.866 ns (0 allocations: 0 bytes)
0xbcb048922324e5f6
julia> 4.308/2.866
1.5031402651779482
So it’s about 1.5 times the cost compared to just generating the 64 bits, and improves the left tail a LOT, the smallest nonzero value it can generate is ldexp(1.0f0,-64) compared to I think ldexp(1.0,-24) for the current code?
The smallest Float32 on the other hand is about 2.58f-26 as big as ldexp(1.0f0,-64)
So we have the option also to extend the dynamic range even further, we could do something like ldexp(Float32(rand(UInt32)),-64-32) as a 3rd step of extending the dynamic range even farther, it would only get hit once every 2^64 calls.