# Output distribution of rand()

I wanted to ask about the output distribution of floating-point `rand()`: Naively, I would have expected the following distribution:
-the sign bit is zero
-the mantissa is a random bitstring
-the signigicant is an exponentially distributed integer, e.g. const-`leading_zeros` of an infinite bitstream
-subnormals can be rounded to zero, but should not appear in real life for 32-bit or 64-bit numbers (probability of 2^(-2^(7)) for Float32 which is approximately never)

This corresponds to sampling a random real number in [0,1], according to uniform distribution, and rounding down to the next representable floating point number.

In reality, this is not the case:

``````julia> function ff(N,k, fT,iT)
for n=1:N
r=rand(fT)
i=0
if r < 2.0^(-k)
rr=reinterpret(iT, r)
@show r
@show rr
i+=1
i>10 && return nothing
end end
nothing
end

julia> srand(42); ff(1<<25, 24, Float64, UInt64)
r = 5.29506605229102e-8
rr = 0x3e6c6d7bbe000000
julia> srand(42); ff(1<<25, 24, Float32, UInt32)
r = 0.0f0
rr = 0x00000000
r = 0.0f0
rr = 0x00000000
r = 0.0f0
rr = 0x00000000
r = 0.0f0
rr = 0x00000000
``````

From this it is pretty clear that a more naive construction is used; e.g. setting the mantissa as a random bitstring, and float-subtracting one, which corresponds to setting `mantissa0=rand52; significant=const+leading_zeros(mantissa0); mantissa = mantissa0 << (leading_zeros(mantissa0)+1)`, which shifts in all-zero bits from the right, instead of shifting in random bits.

Is this distribution standard / intended?

Should this be documented? (i.e. the fact that only 23/52-bit fixed-point numbers are generated and the the probability of `rand(Float32)==0` is very large)

Is this distribution standard / intended?

This is how dSFMT, which is used internally works: it only generates 52 random bits at a time, and uses that to produce a random `Float64` in `[1, 2)`, to which 1 is substracted. Nowadays, we do the substraction on the Julia side. So yes this is intended, and generating a random float in `[0,1)` in a more perfect way in software (given uniform random bits packed in an `UInt64`) would be much less time efficient that this hack. I’m no expert at all on the topic, but this looks very similar to using `x = a+rand()*b` to get a random number in `[a, a+b)`, which is a relatively widespread technique in scientific packages (and in “modern” C++ – at least the g++'s standard library – as well if I remember correctly).

1 Like

Thanks. A “correct” [0,1)-float would take 54 bits of entropy, instead of 52; alas, with a variable bit-rate encoding.

Reason for asking was that I was thinking about submitting a PR for a CSPRNG, and as far as I understood, there is currently no generic structure that does buffering and conversion-to-float from a random bitstream?

Using the native instructions, aes128-ctr has roughly the same speed as dsfmt for generating uint64 (on julia 0.7, on my machine, and with caching of all the roundkeys, which amounts to <12 x i128>=192 bytes of state).

Without hardware support, one could use chacha20/12; I think writing aes in software is too error-prone if avoidable (timing side-channels! DJB’s crypto-suites are cool for being really hard to mess up).

Re hack and subtraction: The principled way could be a hack along the lines of

``````julia> function getfloat(r::UInt64)
if mo >11
end
pref = reinterpret(UInt64,1022-mo)<<52
return pref | ((r <<12)>>12)
end
``````

This has a small bias that it avoids really tiny random numbers (smaller than 2^(-76) that should appear once-in-2^76; fixable by using a while loop). The if-block triggers with a probability of 2^(-12) only, so it should not dominate the cost.

edit: I think the real performance problem is that the branch can prevent simd Ok, I did some benchmarking. The distribution is mainly a problem for `Float32`. I think the following variant is competitive, and gives a much smoother distribution near 0, because we use the entire 32 bits of entropy:

``````julia> using Random
julia> using BenchmarkTools
julia> N=20_000; inp=rand(UInt32, N); outp = zeros(Float32,N);
julia> function getfloat_trail(r::UInt32)
mo = Int32(trailing_zeros(r))
pref = reinterpret(UInt32,Int32(126)-mo)<<23
res = pref | (r>>9)
return reinterpret(Float32,res)
end
julia> function getfloat_std(r::UInt32)
rr = 0x3f800000 | (r>>9)
res = reinterpret(Float32, rr)-1.0f0
return res
end
julia> @btime Random.rand!(\$inp);
17.782 μs (0 allocations: 0 bytes) # 1.7 cycles/number
julia> @btime Random.rand!(\$outp);
21.121 μs (0 allocations: 0 bytes) # 2.1 cycles / number