A recent hackernews post lead me to necro https://discourse.julialang.org/t/output-distribution-of-rand/12192. If we want to reopen that can, I think a new thread is in order.

### Summary

`rand()`

, and much more acutely `rand(Float32)`

generate a uniformly distributed random floating point number in the unit interval. In actuality, these functions generate a uniformly distributed fixed-point number in the unit interval, and convert it to floating point. This means that the output distribution close to zero is quite counter-intuitive, and very much not what one would naively expect from â€śrandom floating point number in the unit intervalâ€ť.

This is a footgun. Do we want to do something about it? What should we do about it?

### Fixed point vs floating point numbers

Fixed point numbers have the property that adjacent numbers have constant distance. Floating point numbers have the property that adjacent numbers have roughly constant quotients. In the unit interval, this means that representable floating point numbers are much denser close to zero. For floating point numbers, one typically measures errors in quotients, not differences (relative error, in order to be scale-free; if one cared about absolute errors only, there is conceptually no point in using floating point numbers in the first place!).

If one adds a small and a large floating point number, then the excess precision of the small number is lost. This is not a problem in itself â€“ it creates neither large relative nor absolute errors.

If one subtracts two numbers of the same rough magnitude, then this is called â€ścancellationâ€ť, and it can be highly problematic: The result has still small absolute error, but can have very large relative error. If we happened to use floating point numbers for any other reason than convenience or CPU support, then this is called catastrophic cancellation.

### How julia generates random floats

Julia generates random floats by first generating a random number `r = rand12()`

in the interval `[1, 2)`

. In this interval, floating point numbers are the same as fixed point numbers (literally!). This is therefore very easy to do and fast â€“ just fill the significand/mantissa with a random bitstring.

Next we subtract `1`

,

```
rand() = rand12() - 1
```

At this point we should be very afraid: `rand12() - 1`

contains a catastrophic cancellation if our random number in `[1,2)`

happened to be close to 1. In other words, we see that our random numbers are very bad for results close to zero.

### When does it matter

This does not matter at all if we add our random numbers to something nonzero of constant size.

It does matter if we e.g. compute `1/rand()`

. Or to give a less contrieved example: Suppose we want to sample from a random distribution that gives numbers in `[1, Inf)`

; say from a power law, with probability density `~ 1/x^a`

with parameter a>1. The way this is typically done is by considering the inverse cumulative distribution function, `F(u) = x`

such that `probability(sample() > x) = u`

, where `u \in [0,1]`

. For the power law, this means `u ~ 1 / x^(a - 1)`

, or `F(u) ~ 1 / u ^ (1/(a-1)) = pow(u, -1 / abs(a-1))`

. In other words, we will have code like e.g. with a = 2:

```
sample() = constant / rand()
```

This is code that looks perfectly fine from a precision standpoint. We have a singularity at zero; but this is not an issue for floating point numbers. If we decided to instead write the real-arithmetic equivalent

```
sampleBad() = constant / (1 - rand())
```

i.e. decided to place the singularity of our mapping at `1`

instead of `0`

, then alarm bells would start ringing â€“ this is catastrophic cancellation. The statistics of extreme events will be completely wrong. But extreme events are one of the most important things when looking at fat-tailed distributions like power laws!

And, with the current julia random number generation, this is what we do:

```
sample() = constant / (rand12() - 1)
```

In other words, the output of `rand()`

is by construction unusable for applications that care about precision close to zero.

### An extreme example: Exact zeros in Float32

Consider: Take a uniform random real number `u \in [0,1]`

, and ask: What is the probability that `convert(Float32, u) == 0.0f0`

?

Well, a good guess would be `nextfloat(0.0f0) == 1.0f-45 `

, but depending on rounding mode intricacies something like roughly `1.0f-37`

is also a valid guess (smallest non-subnormal positive float).

It does not matter. The age of the universe is of rough order `1.0f27`

nanoseconds, i.e. the correct answer should be â€śnever observed in all past or future computations done by mankindâ€ť.

Will you observe `convert(Float32, rand(Float64)) === 0.0f0`

? probably not today. Will you observe `rand(Float32) === 0.0f0`

?

yeah, all the time:

```
julia> function func()
N = 0
while true
N += 1
rand(Float32) == 0 && return N
end
end
func (generic function with 1 method)
julia> func()
22098446
julia> func()
3274928
julia> func()
19735810
```

It is egregiously misleading to say that `rand(Float32)`

generates a uniform distributed random floating point number from the unit interval.

### What to do about it: Nothing

We can chose to ignore the problem. We can say: This is well-known among experts and they know how to work around it; this is how everyone does it; almost nobody is complaining.

### What to do about it: Document it

One way to approach this is to document this footgun and provide an alternative function that does the right thing

An example is `log`

, which links to `log1p`

for avoiding cancellation when evaluating logs of numbers close to 1. So we could have a `rand0p`

that is linked in the docs, and it is up to users to know / read the docs to use the right random function, on penalty of incorrect statistics of extreme events close to zero.

### What to do about it: Fix it for real

The complementary approach to documenting this would be to provide an alternative `fastrand`

function that produces the current counter-intuitive distribution, and change `rand`

to produce the Right Thing

Thus we would replace a correctness footgun with a performance footgun: it is up to users to know / read the docs to use the right random function, on penalty of mysterious performance losses.

This has large compat issues (is there code that relies on the counter-intuitive cancellation?) and performance issues, depending on how we implement that: how fast can we generate the Right Thing? Can we SIMD it? With which instruction sets (i.e. does it need avx512 or is there a avx2 / neon way of doing this right)? On GPU? Will this lead to problems when people implement/port algorithms from papers that assume random numbers to be fixed-point-distributed instead of floating-point-distributed?

This is a can of worm that requires lots of discussion.

### What to do about it: Document it, partially fix it

We can split it in the middle: I.e. do something reasonably fast that generates a distribution that is less egregiously misleading, and provide document and link two alternatives `fastrand`

and `rand0p`

. An example would be: Exclude zero from `rand(Float64)`

output but otherwise keep it, and implement `rand(Float32) = convert(Float32, rand(Float64))`

.

### What do other languages do?

Java generates the same flawed distribution as julia. Numpy generates the same flawed Float64 distribution as julia, cf https://github.com/numpy/numpy/blob/85d956917903624289c451bd0fcfa4743ec86988/numpy/random/_common.pxd#L67 , I think they do something better for Float32, though?

Shockingly zig does the right thing (or something close to it), cf Zig. In fact it was this link on the related hackernews discussion that renewed my hope that something can be done at all!