I think that this would also make the expected value exactly equal to 0.5, rather than 0.49999999999999988897…

It should be possible to do in zero clock cycles by modifying the last step of the radom-number generator to subtract prevfloat(1.0) instead of subtracting 1.0.

(Answering @aerdely) If you want an event to happen with probability p:

if rand() < p
do_something
end

That way do_something never happens if p==0 and always happens if p==1. It certainly makes my code easier!

Historically, I suspect the reason goes back (at least) to the K&R C rand() function, which (on a 32 bit system) returns an integer on [0, 2^32 - 1] (for hardware reasons), and then dividing by 2^32 is the simple way to get a floating point number on [0, 1).

The more I’ve thought about this, the more I’ve come to believe that the current Julia (and C, C++, Fortran, Python, Perl and Ruby) implementations are wrong. Here’s why:

The rand function does not actually draw from the infinite set of points on the (open or closed) interval. It draws from the finite set 0:eps(T):1-eps(T). (Which has the convenient property that the elements can be represented in type T.)

We might want to think about rand as drawing from the infinite set (the open or closed interval) and then projecting (rounding) onto a finite set of floating-point numbers. But for that metaphor to work with the above-mentioned set, the interval must be (-eps(T)/2, 1-eps(T)/2). (The open or the closed interval. Mathematically, there’s no difference. The probability of hitting an endpoint in a finite number of draws is zero.)

Setting the least significant bit to 1, as suggested by @rfourquet would be equiivalent to drawing from eps(T)/2:eps(T):1-eps(T)/2, which in turn would be equivalent to drawing from the (open or closed) continuous interval (0,1) and then correctly rounding to the nearest member of that set. This would not cost any extra time, and is the correct fix, i.m.o.

I am not sure we want that, as that would be very tricky to implement (note that floating point numbers get denser near 0).

These implementations just use a reasonable compromise between speed and “correctness”, approximating the uniform CDF with a step function. It is of course understood that there exist floating point numbers between 0 and 1 which can never be drawn by rand, eg eps()^4.

Sorry if I was unclear. By “a set of floating point numbers” I meant a set like 0:eps(T):1-eps(T) as previously mentioned. I did not mean “the set of all floating point numbers of a given type”.

Please do! From the point of view of Monte Carlo simulations etc, this should only make things better, and since it can be accomplished in zero clock cycles, there’s no performance argument.

I think the main argument against would be that some people might have tests which rely on random numbers being predictable, like:

# The Ziggurat Method for generating random variables - Marsaglia and Tsang
# Paper and reference code: http://www.jstatsoft.org/v05/i08/

is used for randn and randexp but such paper requires the use of a uniform generator in the unit open interval (0,1) and Julia’s code uses rand which generates from [0,1) therefore, strictly speaking, the assumption of the Ziggurat Method is being violated, and so it is that Julia’s code has to define a function randexp_unlikely to fix it and avoid throwing a zero. That is why, from a probabilistic point of view, it is better to have uniform random generators in the open unit interval (0,1).

Since Julia’s code for randexp avoids throwing a zero, it is straightforward to obtain random variates from the open interval (0,1) because the distribution of \exp(-X) is Uniform in the open interval (0,1) whenever X has exponential distribution with scale parameter equal to 1.

The reason for randexp_unlikely is not that rand(Float64) might return an exact zero. According to the comments, randexp_unlilkely gets called about 1.2 % of the time, while an exact zero only happens about 0.0000000000002 % of the time.

It seems that in extremely rare cases (less than once per 10^17 draws) randexp will return Inf, which is a bug. (A 1000-hour Monte-Carlo simulation would have a 1 % chance of returning Inf or NaN instead of the desired result.)

[Edit: I think I may have overestimated the probability. Maybe it’s more like once in 10^19 draws, or a 0.01 % chance per 1000 CPU-hours of simulation. I don’t have access to a compute cluster big enough that I can test…]

With the fix to rand proposed above, randexp would instead return 36.7368005696771 44.43391803980815 in those cases.

For what it’s worth, I investigated what some other languages are doing:

In Python and Ruby (at least the builds that happened to be installed on my computer) there seems to be one more bit of randomness, so the bias is -eps/4.

To summarize the issue again: Julia and most frameworks sample a uniform fixed-point number in (0,1] and convert it to floating point (no rounding). One can argue that the proper way of generating random floats would be to generate a uniform infinite-precision number and round it to float. That is very different because floats are more precise than fixed-point numbers close to zero.

The “improper” way has probability of nextfloat(T(1))-T(1) of emitting zero, while the “proper” way has probability nextfloat(T(0)). The proper way has two more bit of entropy.

To put numbers on it: 1.2f-7 vs 1.0f-45 in Float32 and 2e-16 vs 5e-324 for Float64. We see that “proper” draws should give zero or subnormals approximately never, and Float32 draws a lot of arguably spurious zeros.

However, that would be slower as long as we use a weird RNG like Mersenne twister (make 54 random bits in correct position) instead of “random bitstring”-style RNGs (make UInt32 / UInt64 / UInt128 / UInt256 only, and convert later).

I don’t think @Per is arguing for that (I also misunderstood this originally).

That said, my inner geek would find it fascinating to come up with a scheme for this (eg first draw and exponent with a geometric distribution, then fill in the bits with a 52-bit uniform, there must be some wrinkles one would need to think of), but in practice if one needs “proper” randomness around 0 then something like a transformed value with randexp is the way to go (which, in turn, is using a very clever algorithm along these lines).

Yes, to be clear: I think 51 bits of randomness is absolutely fine as the default option. I was only arguing for removing the bias, as in the PR.

As a side note, and unrelated to the bias issue, I wouldn’t mind if there were an option to get what’s referred to above as “proper” draws. This can easily be accomplished as rand(T) + eps(T)*rand(T) + eps(T)^2*rand(T) + … where the series can be truncated after two terms in almost all cases, so the average runtime is approximately twice that of rand(T).

One place where one might want to use such draws is when testing the precision of floating-point code. Otherwise one might miss bugs such as using log(1+x) instead of log1p(x) that only manifest themselves when the input is not a multiple of `eps(T)˜.