Output distribution of rand(Float32) and rand(Float64), thread 2

My own view is you need 100% correctness in the sense of never producing Inf, NaN, crashing, or numbers outside the range of what your interface contract says. Small biases are inevitable so I agree you’re really looking to minimize rather than eliminate those (there are inevitable biases already in the random bits algorithms). But the actual quote was about whether or not you had to fall back to redoing a calculation at some rare interval, so it’s really about amortized cost. Except for the SIMD issue potentially making the usual case slow, having to take twice as long once every few million, billion, or trillion generations is 100% a non-issue.

It seems to me like this kind of thing satisfies all the requirements, except I don’t know how it performs in the rand! type fill a buffer case.

Obviously we could theoretically do anything. That’s why this conversation is going in circles.

All RNGs of this flavor can be functionally described as either rand(0:2^N-1)/2^N (rounding down) or rand(1:2^N)/2^N (rounding up). Bigger N better, but at some point further increases to N are unobservable. We’re currently generating these with not much more than a bit-shift and multiplication — very little runtime. Smaller time better, but at some point further decreases in time are unobservable.

We grant a choice of Float64 and Float32, and that generally is expected to come with a similar precision/speed tradeoff.

In my view, Float64’s current N=53 algorithm is getting close to that “unobservable” region — if we can improve upon it without much of a performance regression, great, otherwise I don’t think it’s worth changing.

Float32’s N=24, however, is far from it. So the game here is to make N as big as possible while keeping the runtime below Float64.


Let’s make this concrete. Currently, bulk Float32 generation spends about 10% of its time in doing the bits-to-float twiddling on my machine; the remainder is in memory bandwidth and Xoshiro. Shoe-horning in a non-hyper-optimized version of a N=32 algorithm (that using trailing zeros) into Xoshiro’s bulk generation means that bit-twiddling is 5x more expensive, now taking 50% of the time… and making rand!(::Vector{Float32}) twice as slow and the same speed as Float64.

So that’s the challenge:

Increase N for Float32 without busting the performance of Float64. Increase N for Float64 without busting its own performance by more than — say — 10%.

REPL session
julia> using BenchmarkTools, Random

julia> A = Array{Float32}(undef, 2^24);

julia> @benchmark rand!($A)
BenchmarkTools.Trial: 950 samples with 1 evaluation.
 Range (min … max):  5.202 ms …  6.062 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.250 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.263 ms ± 50.811 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▆▆█▄ ▁
  ▂▁▁▁▂▂▂▂▂██████▇▇▅▆▇▅▄▅▄▄▄▃▄▃▃▂▃▂▂▁▂▂▁▂▂▂▂▂▂▂▂▂▁▁▂▂▂▁▂▂▁▁▂ ▃
  5.2 ms         Histogram: frequency by time        5.41 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @eval Random.XoshiroSimd begin
       @inline _bits2float(x::UInt64, ::Type{Float32}) = _f32bits(x)

       @inline function _f32bits(r::UInt32)
           exponent = trailing_zeros(r << 1) # 1 - 32
           exp_bits = UInt32(127-exponent)<<23
           fraction = (r ⊻ (UInt32(1) << (exponent-1)))>>9 # unset the trailing one
           return (exp_bits | fraction) * !iszero(r)
       end
       @inline function _f32bits(x::UInt64)
           ui = (x>>>32) % UInt32
           li = x % UInt32
           return (UInt64(_f32bits(ui)) << 32) | UInt64(_f32bits(li))
       end
       @inline _f32bits(x::VecElement{UInt64}) = _f32bits(x.value)
       end
       for N in [4,8,16]
           VT = :(NTuple{$N, VecElement{UInt64}})
           @eval Random.XoshiroSimd @inline _bits2float(x::$VT, ::Type{Float32}) = map(_f32bits, x)
       end

julia> @benchmark rand!($A)
BenchmarkTools.Trial: 486 samples with 1 evaluation.
 Range (min … max):  10.269 ms … 10.522 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     10.285 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.297 ms ± 34.388 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▆█▄▄▃▄▄▄▂▂▁▂▁▁▁
  ████████████████▇██▇▇▅▇▄▄▄▄▅▅▄▁▄▄▅▅▇▁▁▁▁▄▁▄▁▁▄▄▄▁▁▅▄▁▄▁▁▄▁▄ ▇
  10.3 ms      Histogram: log(frequency) by time      10.4 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
3 Likes

This is a fun problem. Here’s my crack at it:

function randf(
    ::Type{T},
    u::UInt64 = rand(UInt64),
) where {T<:Union{Float16,Float32,Float64}}
    # only depends on T
    s = 8*sizeof(T)
    p = Base._precision(T) - 1
    U = T === Float16 ? UInt16 :
        T === Float32 ? UInt32 :
        T === Float64 ? UInt64 : error()
    m = T === Float16 ? one(T) :
        inv(reinterpret(T, U(65 << p)))

    # real computation
    z = leading_zeros(u)
    if T !== Float16
        b = (u << (z + 1) >>> (64 - p)) % U
        e = 64 - z
    else
        # b = (u << min(14, z + 1) >>> (64 - p)) % U
        b = (u % U) & 0x03ff
        e = max(0, 14 - z)
    end
    b |= (e % U) << p
    x = reinterpret(T, b)
    x *= m
end

randf(T::Type, u::Integer) = randf(T, UInt64(u))

It looks like a lot of code but it boils down to not so much:

julia> @code_native debuginfo=:none randf(Float64, rand(UInt64))
	clz	x8, x0
	add	x9, x8, #1
	lsl	x9, x0, x9
	lsr	x9, x9, #12
	cmp	x0, #2
	csel	x9, xzr, x9, lo
	sub	x8, x9, x8, lsl #52
	mov	x9, #288230376151711744
	add	x8, x8, x9
	fmov	d0, x8
	mov	x8, #8921630861820952576
	fmov	d1, x8
	fmul	d0, d0, d1
	ret

julia> @code_native debuginfo=:none randf(Float16, rand(UInt64))
	clz	x8, x0
	mov	w9, #14336
	sub	w8, w9, w8, lsl #10
	tst	x0, #0xfffe000000000000
	csel	w8, w8, wzr, ne
	bfxil	w8, w0, #0, #10
	fmov	s0, w8
	ret

I edited out some junk asm comments before and after actual asm code. Code for Float32 is very similar to Float64. This implementation is also not much of a slowdown compared to our current rand for floats:

julia> using BenchmarkTools

julia> @btime rand(Float64);
  2.708 ns (0 allocations: 0 bytes)

julia> @btime randf(Float64);
  2.833 ns (0 allocations: 0 bytes)

julia> @btime rand(Float32);
  2.791 ns (0 allocations: 0 bytes)

julia> @btime randf(Float32);
  2.917 ns (0 allocations: 0 bytes)

julia> @btime rand(Float16);
  2.666 ns (0 allocations: 0 bytes)

julia> @btime randf(Float16);
  2.916 ns (0 allocations: 0 bytes)

So about 4% slower for Float64 and Float32 and 9% slower for Float16. I didn’t benchmark vectorized versions because plugging this into our Sampler system was more complicated than I managed to get through, but I can’t see any reason why this wouldn’t vectorize as long as there’s vectorized clz instructions, which there are on ARM and x86 with AVX512 or newer. Older x86 machines can get by without this vectorizing.

What about behavior? I’ll address the Float64 and Float32 versions first and then the Float16 separately since it’s a bit different.

At the highest level, it uses the leading zeros to pick an exponent from 0-64 for Float64 and Float32 or from 0-14 for Float16 with a geometric distribution where the highest exponent has probability 1/2. It uses the bits after the leading one to fill the significand. For Float64 and Float32 it takes the bits right after the leading bit and fills the signficand from left to right; for Float16 it just uses the last 10 bits of the number. For the higher precisions where there aren’t always enough trailing bits to fill the significand, the trailing bits in the significand are zeros. It combines the exponent and significand to get a float in the range [0, w) where w = reinterpret(T, U(65 << p)) for Float64 and Float32 and w = 1 for Float16. Finally, it scales that value up to [0, 1) via multiplication by m = inv(w) which is exact because w is a power of two, so m is also a power of two and multiplication by it only changes the exponent. For Float16 this is a no-op which the compiler recognizes since m is a compile-time constant depending only on the type.

One thing worth noting about this is that as a function from UInt64 --> Float{32,64} randf is monotonically non-decreasing: i.e. u ≤ v ⟹ randf(T, u) ≤ rand(T, v). So we can explore the smallest values it can possibly generate by looking at small “seeds”:

julia> randf(Float64, 0)
0.0

julia> randf(Float64, 1)
5.421010862427522e-20

julia> randf(Float64, 2)
1.0842021724855044e-19

julia> randf(Float64, 3)
1.6263032587282567e-19

The probability of getting exactly zero is 1/2^64 and you’ll note that exp2(-64) == 5.421010862427522e-20 whic is exactly the value of randf(Float64, 1). So if we do the Bernoulli thing with rand(Float64) < exp2(-64) we get the exact right probability. At the low end of the outputs, we don’t have a lot of “random” bits to go in the significand:

  • 0 and 1 there are no significand bits to use (different exponents)
  • 2 and 3 have the same exponent and differ by only the leading significand bit
  • 5-8 have the same exponent but 2 bits of significand to distinguish them

And so on. By the time you get up to u = 2^52 you have enough random bits after the leading bit to fill the entire significand.

We can also explore the largest values it can generate by looking at large seeds:

julia> randf(Float64, typemax(UInt64))
0.9999999999999999

julia> randf(Float64, typemax(UInt64)-1)
0.9999999999999999

julia> randf(Float64, (typemax(UInt64) - 0) << 11)
0.9999999999999999

julia> randf(Float64, (typemax(UInt64) - 1) << 11)
0.9999999999999998

julia> randf(Float64, (typemax(UInt64) - 2) << 11)
0.9999999999999997

At the low end, every change of the input produces a different output but at the high end, many contiguous inputs map to the same output—2^11 contiguous values at the top. This is necessary in order to produce the correct uniform distribution on [0, 1) given the uneven quantization of floats. The probability of getting 0.9999999999999999 is 2^11/2^64, which is 1/2^53 and you’ll note that 1/2^53 == 1-0.9999999999999999 so the probabilities look right up at the top too.

Things are very similar for Float32. Low end:

julia> randf(Float32, 0)
0.0f0

julia> randf(Float32, 1)
5.421011f-20

julia> randf(Float32, 2)
1.0842022f-19

julia> randf(Float32, 3)
1.6263033f-19

High end:

julia> randf(Float32, typemax(UInt64))
0.99999994f0

julia> randf(Float32, (typemax(UInt64) - 0) << 40)
0.99999994f0

julia> randf(Float32, (typemax(UInt64) - 1) << 40)
0.9999999f0

julia> randf(Float32, (typemax(UInt64) - 2) << 40)
0.9999998f0

The probability of getting exactly zero is still 1/2^64, just like Float64 and moreover randf(Float32, 1) == Float32(randf(Float64, 1)), so the Bernoulli probabilities work out the same. It is not generally case that randf(Float32, u) == Float32(randf(Float64, u)) for all u but they are always within an ULP of each other and the deviation is due to rounding when converting Float64 to Float32. We’ve already seen that this is a problem with trying to generate Float32 random values from Float64. Generating the Float32 values directly avoids the last bit bias caused by rounding.

The Float16 implementation is a little different, so let’s look at that as well. For Float16 I didn’t make randf(Float16, u) a monotonic function of u for a few reasons:

  • In higher precisions, you might not have enough random bits after the leading one, so you want to shift the trailing bits to the front of the significand; this naturally makes randf monotonic.
  • For Float16 we don’t need to do this since there’s always enough bits: 15 bits to pick the exponent, 10 bits for the significand—heck, you’ve got 39 bits to spare.
  • In Float16 you generate all values, including subnormals which behave a bit differently and make the approach use for higher precisions break, so we do something simpler, but which doesn’t end up being monotonic.

Let’s look at some low inputs for Float16:

julia> randf(Float16, 0)
Float16(0.0)

julia> randf(Float16, 1)
Float16(6.0e-8)

julia> randf(Float16, 2)
Float16(1.0e-7)

julia> randf(Float16, 3)
Float16(2.0e-7)

If randf were monotonic in this case, this would be bad since Float16(6.0e-8) is way bigger than 1/2^64—it’s about 1/2^24. Fortunately, there are a lot of seed values that produce Float16(0.0): we get zero as long as first 14 and last 10 bits of u are zeros; there are 2^(64-14-10) = 2^40 such values. Lo and behold, 2^40/2^64 is exactly Float16(6.0e-8), so the chance of randf(Float16) < Float16(6.0e-8) is Float16(6.0e-8) as desired.

At the high end, we have this:

julia> randf(Float16, typemax(UInt64))
Float16(0.9995)

julia> randf(Float16, typemax(UInt64)-1)
Float16(0.999)

julia> randf(Float16, typemax(UInt64)-2)
Float16(0.9985)

julia> randf(Float16, typemax(UInt64)-3)
Float16(0.998)

Similar to the low end, non-monotonicity means we get many duplicates of these values. We want the probability of getting Float16(0.9995) to be equal to 1 - Float16(0.9995) == Float16(0.0004883). The output is going to be Float16(0.9995) as long as the first 1 and last 10 bits are all ones; there are 2^(64-1-10) = 2^53 such values, and 2^53/2^64 is exactly Float16(0.0004883) as desired.

Interlude: Are we checking the uniformity of the distribution the right way?

I’ve been checking for P(rand() < p) = p. This is a viable condition for uniformity with real numbers, but then again, so is this: P(rand() ≤ p) = p. For real numbers these are the same condition because the probability of getting exactly p is zero. But for floats, that’s not the case and P(rand() < p) = p is a different condition than P(rand() ≤ p) = p. This doesn’t matter much for moderately sized p but for very small p it can make a big difference. As we saw above, we have P(rand(T) < 1/2^64) = 1/2^64 for Float32 and Float64 which is what we want, but we also have P(rand(T) ≤ 1/2^64) = 1/2^32 which is twice as big as 1/2^64.

So which condition should we use for random floating point generation? By one condition randf is giving exactly the right uniform distribution, by the other it’s overweighting some values by a factor of two! My argument is that < is the right condition for [0, 1) based on looking at the edge cases of zero. If you use P(rand() ≤ p) = p and take p = 0 then you would have P(rand() ≤ 0) = 0 which is guaranteed to fail since we’ve stipulated that 0 can be produced by rand(). If, on the other hand, we had a support of (0, 1] then we should use by a symmetrical argument: using < would give P(rand() < 1) = 1 - P(rand() = 1) < 1 since we assumed that 1 is a possible output.

There’s also a question of which p values the identity can be expected to hold for. If ε is the smallest non-zero value that your function can produce, then P(rand() < ε/2) = P(rand() < ε) = P(rand() = 0) = ε which is twice as big as the identity requires. More generally, if a < x < b where a and b are adjacent outputs of rand() and x is a real number between them (and therefore not a possible output since they’re adjacent), then we have the following:

  • a = P(rand() < a) = P(rand() < x) < x

What this tells us is that in between adjacent output points, we cannot possibly expect the uniformity condition to hold. So the best we can hope for is that the condition holds at all the output values of the random generator.

To summarize my take, we should check the uniformity condition at all the output values of the random value generator and the condition we should use depends on the interval:

  • If you’re sampling from [0, 1) use P(rand() < p) = p;
  • If you’re sampling from (0, 1] use P(rand() ≤ p) = p.

If you’re sampling from [0, 1] then you’re kind of screwed as neither uniformity condition works and if you’re sampling from (0, 1) then I’m not sure what you should do. In any case, I think we probably made a good choice by sticking with the traditional [0, 1), and it may not be an accident that this is the traditional support.

Somewhat bizarrely, defining rand() = 0.0 would be a perfectly good uniform RNG on [0, 1) by this criterion. Similarly, rand() = rand(Bool) ? 0.0 : 0.5 would also be perfectly uniform. All that means is that we also have to consider what the support of the function is when evaluating an RNG, but given a support set, I think this is the right way to evaluate whether the distribution is uniform or not.

Proof of Uniformity

Now that we’ve settled a uniformity condition, can we actually prove that randf is uniform by this definition instead of just spot checking some examples? I haven’t worked this out yet and this is long enough, so I’ll post it now and work on the proof…

11 Likes

unfortunately “older” here includes Intel’s consumer CPUs from the last 2 years

1 Like

Thousands of instructions for random shit I’ll never need and they can’t give me vectorized count leading zeros? Get it together, Intel. That’s so dumb I’m not sure I can bring myself to care… At some point Intel will have vectorized clz.

1 Like

Here’s a version that maps a UIntXX to FloatXX with round-down behavior and no leading_zeros. It simply does the conversion in two parts, then computes the rounding error and decrements the result if the rounding was upward. It appears to SIMD on my x86 that doesn’t support vectorized leading_zeros, using 21 instructions for 1 UInt32 conversion and 28 for 8. That said, this still may not be competitive with a more clever application of leading_zeros (or simply one that serializes when it can’t SIMD rather than the gymnastics my compiler was doing). And nothing will compete with a downward-rounding native conversion instruction.

function scale_to_float3(u::U) where U<:Union{UInt16,UInt32,UInt64}
	# map UIntXX linearly into FloatXX on range [0,1)
	F = Base.floattype(U)
	bitwidth = 8 * sizeof(U) # bits in type U
	lomask = typemax(U) >>> (bitwidth ÷ 2) # mask of low bits of u
	scale = F(exp2(-bitwidth))
	lo = F(u &  lomask) * scale # must scale here for UInt16, because Float16(typemax(UInt16)) overflows
	hi = F(u & ~lomask) * scale # must scale here for UInt16, because Float16(typemax(UInt16)) overflows
	f0 = lo + hi # full-width value
	efflo = f0 - hi # get effective value of lo used
	roundedup = efflo > lo # determine whether f0 was rounded upward
	fu = reinterpret(U, f0)
	fu -= roundedup # if we rounded up, decrement to round down instead
	f = reinterpret(F, fu)
	return f
end

code_native(x->map(scale_to_float3,x), (NTuple{1,UInt32},); debuginfo=:none)
code_native(x->map(scale_to_float3,x), (NTuple{8,UInt32},); debuginfo=:none)

This concept could be extended to “excess” conversions of UIntXX to FloatYY with XX>YY, but for now I’ve been trying to focus on same-width conversions.

1 Like

So on the [0,1) vs (0,1] thing, it has everything to do with how you interpret and work with the results of rand() — and I finally have a good mental model here.

Forget binary for a moment; let’s pretend our datatype of interest is decimal-based float with two digits and has a random function defined as randpct() = rand(0:99)/100. Given that definition, how do I implement a test that returns true 5% of the time? I know I just instinctively reach for randpct() < 0.05. But why <? And why not <=? And why not on the other side against 0.95?

I think the best mental model for a RNG with a [0,1) interval is that it approximates choosing a real number by binning and identifying each bin by its leftmost value. If you were to use randpct() <= 0.05, you’d count the results from six bins, not five. And if you wanted to use 0.95, you’d need to test this as randpct() >= 0.95. Of course, you cannot observe finer gradations than hundreds — and if you try, you just count the number of left edges on those hundredths bins. So randpct() < 0.045 is true 5% of the time just the same.

A (0,1] interval would flip that interpretation. You’re now identifying each bin by its rightmost value. So tests like these should be written as randpct() <= 0.05 or randpct() > 0.95.

The (0,1) interval as was proposed long ago would identify each bin by its centerpoint. Now either comparator would equivalently select 5 bins when the threshold is 0.05, but what about randpct() < 0.045? How many bins does that select? How do you succinctly describe that behavior? And how do you describe that bin between 0.99 and 1.0? It’s 0.995, but we only have two decimal places! So now do you round that value up (to a right edge) or down (to a left one)? So this means that floating point semantics have forced us into a left- or right-labeled bin anyways. (nb. when the actual change here was considered, we had a free bit to do this, but we no longer do!). It’s so very good we didn’t do this!

A closed-closed [0, 1] interval would also identify each bin by its centerpoint but it shifts the bins themselves, making the bins for 0 and 1 half as big. Practically, this “feels” right: it’s “just” the correct rounding of all theoretical values! But it’s surprisingly difficult to get a correct definition of randpct, especially in the face of limited entropy and round-to-even. No matter what you do, it requires an extra bit of randomness — you need to do the closed-open thing, uniformly choosing a value in 0:99, and then you still need to split that 0 to be 50% 0 and 50% 1.

So is a [0, 1) randpct() really biased by -0.005? It depends on how you use the thing! If you try to use a threshold test with greater precision than the step size of 0.01, then yes it’ll give you “biased” answers. But as long as you only use < tests and stay within the supported precision of randpct() there is no bias.

So now what about other arithmetic? Multiplications are “just” changing the decimal point — as long as don’t saturate the supported exponents everything works as expected. But of course additions of floating point values can reduce your precision. Both [0, 1) and (0, 1] definitions will rapidly collapse to a poorly-rounded [a, b] distribution following a translation where eps(a) > 2^-N (where N is from my rand(0:2^N-1)/2^N definition). And of course 1/rand() and log(rand()) will error on 0s. Here it’s interesting that the common idiom 1 - rand() to transform from one semi-closed interval to the other will throw away any extra precision we throw at rand here.

Somewhat interestingly, when 2^-N < eps(T(0.5)), there may be a slight advantage to [0, 1) distributions in that it’s closed endpoint is observed less frequently than it would be in a (0, 1] distribution. I’m not sure if this is valuable, though, as few algorithms will have singularities at 1 (and many do at 0).

Those two cases map to N=0 and N=1 from my definition above. Very bad choices indeed, but living on that same well-defined spectrum.

8 Likes

Fortunately, that doesn’t put us in any worse a position than we are currently in by not generating that extra precision in the first place. It is rather interesting that it’s so much better to just do this:

function randOpenClosed01()
    x = rand() # [0, 1)
    x == 0 ? 1.0 : x
end

It seems so much less elegant, but it’s the best way to keep any precision that rand() generates.

It’s also interesting to note that rand() = 0.0 is the only viable choices for N = 0. You can’t do rand() = 0.5 because then P(rand() < 0.5) = 0 ≠ 0.5. If you’re only going to return one value, it has to be zero. If you’re going to return two values, you have a little more choice, but one of them still has to be zero; the other one can be whatever you want it to be so long as that’s how often you return zero. I.e. the function has to have support(rand) = {0, p} and P(rand() = 0) = p.

1 Like

Yes exactly — that’s the case if your threshold test is <. If your threshold test is <= then the only viable choice is rand() = 1.0. And the function has to have support(rand) = {p, 1.0} and P(rand() = 1.0) = 1-p .

1 Like

This is obviously doing the right thing if 2^-N >= eps(T(0.5)), but what if you have more precision? E.g., what if we gave Float32s just one more bit of precision? Now the values in [0, 0.5) are going to be half as likely as those in [0.5, 1). So now you’ve taken a left-binned rand() and threw a half-as-big bin onto the far right side. Now none of < or <= or > or >= are correct in threshold tests.

I think the correct thing to do here is this: You need to shift the edges of every bin, because each bin has variable size.

function randOpenClosed01()
    x = rand() # [0, 1)
    N = 25 # precision of the rand()
    return x + max(eps(x), 2^-N)
end
2 Likes

Regarding the [0,1) thing – I’d like to repeat my argument for pragmatism in not including zero in the output.

Lots of people write code like f(x) = x*log(x) and feed in a random number. Alas, f(0.0) = NaN.

This code is a bug. We have no obligation to save people from their bugs.

On the other hand, exact zero has a very small probability – in @StefanKarpinski 's proposal ~5e-20.

If we exclude 0, nobody is going to miss that. Nobody will do a cluster run that gives bad results because our RNG failed to produce an exact zero.

Nobody will run tests long enough to by chance generate an exact zero that triggers the 0.0 * log(0.0) === NaN and allows the bug to be fixed.

But somebody might just run some big computation on some big cluster, spending several 100K in electicity, and get his day ruined by a stupid NaN that we decided to emit on principle.

2 Likes

Ah, yeah, I ironically totally forgot about the probabilities of the “bins” being different :flushed:. In terms of my randf primitive, the right definition would be:

function randOpenClosed01(T::Type, u::Integer = rand(UInt64))
    x = randf(T, u)
    max(x + randf(T, 1), nextfloat(x))
end

I chose to use nextfloat instead of adding eps since it’s more direct and more efficient.

1 Like

I get the pragmatism but I’m missing the principle. I really like the definitional simplicity and mental model of rand() = rand(0:2^N-1)/2^N (rounding down). Or 1:2^N (rounding up). Doing any sort of funny business in between just seems really ill-advised and problematic. Specifically: what happens to that 0? Does it become floatmin(T)? It gives me the heebie-jeebies to muck about like that, even if it’s nigh-unobservable.

We certainly cannot change our output to be (0, 1] without being massively breaking because those 1s would appear as frequently as zeros do now (no matter how big our N). This would bite those folks doing the “responsible” thing with log(1 - rand()).

4 Likes

I think you meant log1p(-rand()) :smile:

There’s no accuracy difference with our current RNG. Those two are bitwise identical currently since we’re not generating numbers less than eps(0.5) (and only multiples thereof).

So, my attempt would be the following:

julia> rand32(u = rand(UInt64)) = convert(Float32, (u>>1)|0x01) * ldexp(prevfloat(1.0f0), -63)

julia> rand64(u = rand(UInt64)) = convert(Float64, (u>>1)|0x01) * ldexp(prevfloat(1.0), -63)

julia> @btime rand(Float64);
  2.251 ns (0 allocations: 0 bytes)

julia> @btime rand64();
  2.421 ns (0 allocations: 0 bytes)

julia> @btime rand(Float32);
  2.227 ns (0 allocations: 0 bytes)

julia> @btime rand32();
  2.417 ns (0 allocations: 0 bytes)

This emits no exact zeros (because we forcecully set the last bit). It only uses instructions that vectorize on avx2 (probably also on older CPU). The first bit is unset because there is no good instruction for unsigned-int-to-float – the compiler would need a branch for that.

I must admit that I have not properly analyzed the distribution. I am not entirely sure how to do that.

PS. Also, hard agree on “fuck intel” for removing avx512 from their latest lines of laptops. But even without that, I’d feel bad throwing e.g. Zen3 under the bus.

PPS. Function is non-decreasing, so an easy range-check:

julia> nextfloat(rand32(~UInt(0)))
1.0f0

julia> nextfloat(rand64(~UInt(0)))
1.0

julia> rand32(UInt(0))
1.0842021f-19

julia> rand64(UInt(0))
1.0842021724855043e-19

We can also think about tradeoffs between accuracy and testability: There is a “Bermuda interval” of rare events, from maybe 40-80 bits, that is rare enough to be invisible in testing and likely enough to potentially spoil a cluster run.

Yeah, this is what I called the “bermuda interval” – rare enough to be invisible for testing, frequent enough to spoil cluster runs if it triggers bugs / Inf / NaN.

If we exclude 0, nobody is going to miss that. Nobody will do a cluster run that gives bad results because our RNG failed to produce an exact zero.

But somebody might just run some big computation on some big cluster, spending several 100K in electicity, and get his day ruined by a stupid NaN that we decided to emit on principle.

This is not a fruitful argument. It is obvious that if an individual has access to such resources and chooses to squanders them in such a manner, then it is their loss.

Edit: found several more occurrences.

Let us consider this problem on computer implementation of probability in terms of probability – let us analyze the writers of the programs. What is the probability that an individual writing programs for 100k machines does not appreciate the finer details of pseudorandom number generation? If we find an individual in this low-probability category, what is the conditional probability that their program contains other conceptual mistakes that would supercede the occurrence of a few NaNs? My hypothesis, supported by data, and likely agreed with by many readers, is that the conditional probability is high. That is, if one cannot grok pseudorandom number generation on floating point numbers and plan accordingly, then it is highly unlikely that one is capable of writing a meaningful program for 100k machines.

1 Like

I’m repeating myself because I am very very sure that I am correct on the bias ↔ pragmatism tradeoff on returning exact zeros for Float64, Float32 and BFloat16.

This does not necessarily mean that I’m right on the performance ↔ pragmatism tradeoff, though!

I mean, not even Random itself is capable of dealing with the cursed possibility that rand returns zero. Consider the following:

randexp(rng::AbstractRNG=default_rng()) = _randexp(rng, rand(rng, UInt52Raw()))

function _randexp(rng::AbstractRNG, ri::UInt64)
    @inbounds begin
        ri &= 0x000fffffffffffff
        idx = ri & 0xFF
        x = ri*we[idx+1]
        ri < ke[idx+1] && return x # 98.9% of the time we return here 1st try
        return randexp_unlikely(rng, idx, x)
    end
end

@noinline function randexp_unlikely(rng, idx, x)
    @inbounds if idx == 0
        return ziggurat_exp_r - log(rand(rng))
    elseif (fe[idx] - fe[idx+1])*rand(rng) + fe[idx+1] < exp(-x)
        return x # return from the triangular area
    else
        return randexp(rng)
    end
end

So randexp returns Inf once in a blue moon.

PS. Looking at specifications / API documentation and predicting bugs used to be my job when working in IT-sec. These two bugs i(randexp returning Inf in julia and Pareto<f32> in Rust having completely broken tail statistics) were utterly predictable – it is possible to use the API right at an individual invocation, but it is impossible to safely use the API as an institution / major project. Just like strcpy.

3 Likes

Note that mathematically a random exponentially distributed number does exceed the largest Float32 with some frequency so the real issue is about whether that frequency is correct and whether you want to modify the distribution to prevent Inf.

In most cases we do want to prevent Inf and we want the lower tail of rand() to have more density in the floats.

Out of curiosity does anyone know what R does here for rexp If there’s anywhere that stats nerds have spent a bunch of time thinking about RNG and distributions it’s probably R

Also note, mathematically (0,1) and [0,1) have the same measure and if you sample from a uniform on [0,1) mathematically you’ll “never” get 0.

1 Like