Output distribution of rand()

I wanted to ask about the output distribution of floating-point rand(): Naively, I would have expected the following distribution:
-the sign bit is zero
-the mantissa is a random bitstring
-the signigicant is an exponentially distributed integer, e.g. const-leading_zeros of an infinite bitstream
-subnormals can be rounded to zero, but should not appear in real life for 32-bit or 64-bit numbers (probability of 2^(-2^(7)) for Float32 which is approximately never)

This corresponds to sampling a random real number in [0,1], according to uniform distribution, and rounding down to the next representable floating point number.

In reality, this is not the case:

julia> function ff(N,k, fT,iT)
              for n=1:N
              r=rand(fT)
              i=0
              if r < 2.0^(-k)
              rr=reinterpret(iT, r)
              @show r
              @show rr
              i+=1
              i>10 && return nothing
              end end
              nothing
              end

julia> srand(42); ff(1<<25, 24, Float64, UInt64)
r = 5.29506605229102e-8
rr = 0x3e6c6d7bbe000000
julia> srand(42); ff(1<<25, 24, Float32, UInt32)
r = 0.0f0
rr = 0x00000000
r = 0.0f0
rr = 0x00000000
r = 0.0f0
rr = 0x00000000
r = 0.0f0
rr = 0x00000000

From this it is pretty clear that a more naive construction is used; e.g. setting the mantissa as a random bitstring, and float-subtracting one, which corresponds to setting mantissa0=rand52; significant=const+leading_zeros(mantissa0); mantissa = mantissa0 << (leading_zeros(mantissa0)+1), which shifts in all-zero bits from the right, instead of shifting in random bits.

Is this distribution standard / intended?

Should this be documented? (i.e. the fact that only 23/52-bit fixed-point numbers are generated and the the probability of rand(Float32)==0 is very large)

1 Like

Is this distribution standard / intended?

This is how dSFMT, which is used internally works: it only generates 52 random bits at a time, and uses that to produce a random Float64 in [1, 2), to which 1 is substracted. Nowadays, we do the substraction on the Julia side. So yes this is intended, and generating a random float in [0,1) in a more perfect way in software (given uniform random bits packed in an UInt64) would be much less time efficient that this hack. I’m no expert at all on the topic, but this looks very similar to using x = a+rand()*b to get a random number in [a, a+b), which is a relatively widespread technique in scientific packages (and in “modern” C++ – at least the g++'s standard library – as well if I remember correctly).

1 Like

Thanks. A “correct” [0,1)-float would take 54 bits of entropy, instead of 52; alas, with a variable bit-rate encoding.

Reason for asking was that I was thinking about submitting a PR for a CSPRNG, and as far as I understood, there is currently no generic structure that does buffering and conversion-to-float from a random bitstream?

Using the native instructions, aes128-ctr has roughly the same speed as dsfmt for generating uint64 (on julia 0.7, on my machine, and with caching of all the roundkeys, which amounts to <12 x i128>=192 bytes of state).

Without hardware support, one could use chacha20/12; I think writing aes in software is too error-prone if avoidable (timing side-channels! DJB’s crypto-suites are cool for being really hard to mess up).

Re hack and subtraction: The principled way could be a hack along the lines of

julia> function getfloat(r::UInt64)
       mo = leading_zeros(r)
       if mo >11
       mo = 12 + leading_zeros(rand(UInt64))
       end
       pref = reinterpret(UInt64,1022-mo)<<52
       return pref | ((r <<12)>>12)
       end

This has a small bias that it avoids really tiny random numbers (smaller than 2^(-76) that should appear once-in-2^76; fixable by using a while loop). The if-block triggers with a probability of 2^(-12) only, so it should not dominate the cost.

edit: I think the real performance problem is that the branch can prevent simd :frowning:

Ok, I did some benchmarking. The distribution is mainly a problem for Float32. I think the following variant is competitive, and gives a much smoother distribution near 0, because we use the entire 32 bits of entropy:

julia> using Random
julia> using BenchmarkTools
julia> N=20_000; inp=rand(UInt32, N); outp = zeros(Float32,N);
julia> function getfloat_trail(r::UInt32)
              mo = Int32(trailing_zeros(r))
              pref = reinterpret(UInt32,Int32(126)-mo)<<23
              res = pref | (r>>9)
              return reinterpret(Float32,res)
              end
julia> function getfloat_std(r::UInt32)
              rr = 0x3f800000 | (r>>9)
              res = reinterpret(Float32, rr)-1.0f0
              return res
              end
julia> @btime Random.rand!($inp);
  17.782 ÎĽs (0 allocations: 0 bytes) # 1.7 cycles/number
julia> @btime Random.rand!($outp);
  21.121 ÎĽs (0 allocations: 0 bytes) # 2.1 cycles / number
julia> @btime broadcast!($getfloat_std, $outp, $inp);
  4.413 ÎĽs (0 allocations: 0 bytes) # 0.4 cycles / number
julia> @btime broadcast!($getfloat_trail, $outp, $inp);
  8.958 ÎĽs (0 allocations: 0 bytes) # 0.9 cycles / number

Note: This is really slow on my archlinux binary distribution of 0.6.3, and the above was from a recent master 0.7, built from source.

Because this came up on hackernews and the timings are really old, current (julia 1.9.2):

julia> @btime Random.rand!($inp);
  2.493 ÎĽs (0 allocations: 0 bytes)

julia> @btime Random.rand!($outp);
  3.012 ÎĽs (0 allocations: 0 bytes)

julia> @btime broadcast!($getfloat_std, $outp, $inp);
  1.339 ÎĽs (0 allocations: 0 bytes)

julia> @btime broadcast!($getfloat_trail, $outp, $inp);
  3.547 ÎĽs (0 allocations: 0 bytes)
3 Likes

So it took me a bit to grok what all is going on here. This getfloat_trail function transforms a random UInt32 to a Float32. This is different from our current rand(::Float32) — which is approximated here by getfloat_std — in the following ways:

  • Its output is in (0,1) instead of [0,1). 32 bits is cool in that you can actually iterate over all possible values in a reasonable amount of time.
    • Its minimum value is nonzero (1.1641532f-10 instead of 0.0)
    • Its maximum value is closer to 1 (prevfloat(1.0) instead of prevfloat(1.0, 2))
    • Its mean is closer to 0.5 (0.4999999801321886 instead of 0.4999999403953552)
    • It actually sets all bits of the significand true 50% of the time instead of having the LSB always be 0.
    • It generates more unique numbers.
  • It’s slower. Depending on your perspective I guess this is either 2.6x slower (ignoring the cost getting that initial rand(UInt32)) or 1.27x slower (including it) using the benchmarking numbers you posted. I didn’t dig deeper on simdability.

I suppose nearly all the numeric qualities are better — except perhaps for the fact that it doesn’t generate 0s.

8 Likes

I opened a new thread on Output distribution of rand(), thread 2 to discuss the issue, separated from my old solution attempt. I think throwing some half-baked @benchmark around was premature and an impediment to a serious discussion.

My bad, I should have opened with a better thread 5 years ago. Better not continue that mistake by continuing to necro this thread.

3 Likes

Closing to focus discussion in Output distribution of rand(), thread 2