Output distribution of rand(Float32) and rand(Float64), thread 2

So, my attempt would be the following:

julia> rand32(u = rand(UInt64)) = convert(Float32, (u>>1)|0x01) * ldexp(prevfloat(1.0f0), -63)

julia> rand64(u = rand(UInt64)) = convert(Float64, (u>>1)|0x01) * ldexp(prevfloat(1.0), -63)

julia> @btime rand(Float64);
  2.251 ns (0 allocations: 0 bytes)

julia> @btime rand64();
  2.421 ns (0 allocations: 0 bytes)

julia> @btime rand(Float32);
  2.227 ns (0 allocations: 0 bytes)

julia> @btime rand32();
  2.417 ns (0 allocations: 0 bytes)

This emits no exact zeros (because we forcecully set the last bit). It only uses instructions that vectorize on avx2 (probably also on older CPU). The first bit is unset because there is no good instruction for unsigned-int-to-float – the compiler would need a branch for that.

I must admit that I have not properly analyzed the distribution. I am not entirely sure how to do that.

PS. Also, hard agree on “fuck intel” for removing avx512 from their latest lines of laptops. But even without that, I’d feel bad throwing e.g. Zen3 under the bus.

PPS. Function is non-decreasing, so an easy range-check:

julia> nextfloat(rand32(~UInt(0)))
1.0f0

julia> nextfloat(rand64(~UInt(0)))
1.0

julia> rand32(UInt(0))
1.0842021f-19

julia> rand64(UInt(0))
1.0842021724855043e-19