I wanted to ask about the output distribution of floating-point rand()
: Naively, I would have expected the following distribution:
-the sign bit is zero
-the mantissa is a random bitstring
-the signigicant is an exponentially distributed integer, e.g. const-leading_zeros
of an infinite bitstream
-subnormals can be rounded to zero, but should not appear in real life for 32-bit or 64-bit numbers (probability of 2^(-2^(7)) for Float32 which is approximately never)
This corresponds to sampling a random real number in [0,1], according to uniform distribution, and rounding down to the next representable floating point number.
In reality, this is not the case:
julia> function ff(N,k, fT,iT)
for n=1:N
r=rand(fT)
i=0
if r < 2.0^(-k)
rr=reinterpret(iT, r)
@show r
@show rr
i+=1
i>10 && return nothing
end end
nothing
end
julia> srand(42); ff(1<<25, 24, Float64, UInt64)
r = 5.29506605229102e-8
rr = 0x3e6c6d7bbe000000
julia> srand(42); ff(1<<25, 24, Float32, UInt32)
r = 0.0f0
rr = 0x00000000
r = 0.0f0
rr = 0x00000000
r = 0.0f0
rr = 0x00000000
r = 0.0f0
rr = 0x00000000
From this it is pretty clear that a more naive construction is used; e.g. setting the mantissa as a random bitstring, and float-subtracting one, which corresponds to setting mantissa0=rand52; significant=const+leading_zeros(mantissa0); mantissa = mantissa0 << (leading_zeros(mantissa0)+1)
, which shifts in all-zero bits from the right, instead of shifting in random bits.
Is this distribution standard / intended?
Should this be documented? (i.e. the fact that only 23/52-bit fixed-point numbers are generated and the the probability of rand(Float32)==0
is very large)