I do think that if the value would be larger than floatmax then it should probably be an Inf, so the implementations seems correct to me. It could maybe use an option to resample in that case, but it’s easy to do on the user end.
But the probability for that is completely wrong. Your algorithm returns Inf
with a probability of ldexp(1.0, -64) == 5.421010862427522e-20
, i.e. likely to happen in human history, while the correct probability is closer to ldexp(1.0, -1022) = 2.2250738585072014e-308
, i.e. not gonna happen.
No, you’re just interpreting it wrong. The probabilities aren’t about the exact value but rather the likelihood it’s between that and the next value. How big that interval is depends on that N
I keep talking about. Transformations like 1/x and log(x) change from a left-closed (test with <
) to a right-closed (test with <=
) distribution.
julia> floatmax(1.0f0)
3.4028235f38
The logprobability to be greater than floatmax for an exponential distribution is:
julia> logccdf(Exponential(),Float64(floatmax(1.0f0)))
-3.4028234663852886e38
exp(-3e38) is really REALLY small. it should NEVER happen no matter how large a cluster you run on.
I was pointing out that randexp
has a very small but non-negligible probability of returning Inf
; somewhere around ldexp(1.0, -64)
.
@StefanKarpinski pointed out that randexp
generating Inf
is not necessarily a bug – the geometric distribution can generate a value > floatmax, which would correctly round to Inf
.
To that I answered that randexp
returning something larger than floatmax should be much, much more unlikely. In fact, I computed it wrong!
The probability of generating something > floatmax is on order exp(-floatmax)
, i.e. 2^(-2^1000)
. This is a probability that is much smaller than 2^(-64)
.
It is comparable to 2^(-1000)
only in so far as both 2^(-1000)
and 2^(-2^1000)
are both equal to never in our finite universe.
Ok, from R here’s a link with no preview
They call exp_rand and then scale it… so exp_rand is:
This is a bizarre and tremendously slow implementation I think? They take a uniform random number and then add it to itself in a loop until it exceeds 1… if u is 1e-20 that could be 1e20 cycles? (EDIT: Oh no of course, they’re doubling it every time, so it would take around 20 cycles) obviously low probability but worst case it’s really bad? there’s some other stuff going on, I don’t understand the point of it all, but I suspect they value correctness vastly more than speed here.
Please be aware that those links appear to link to GPL code - a direct translation to Julia may not comply with the MIT license the language currently uses.
Note that they also generate a unif_rand and exclude 0 or 1 from their algorithm. If they got a 0 their algorithm wouldn’t terminate.
I’m not suggesting a direct translation just to examine what the distribution looks like and if it excludes some of these problematic possibilities (such as Inf). It clearly does.
Have we reached consensus that randexp
outputting Inf
is, in fact, a bug?
The actual point I wanted to make here is not that we should fix randexp
.
The point I wanted to make is that a rand
distribution that outputs zero is like handing a loaded gun to a group of toddlers. It may not blow off somebodies face today, but it is a bad idea.
I was doubted on that being a bad idea, a la
REAL programmers don’t write buggy code with rand() that outputs zero
So I tried to bring some supporting evidence on the badness of the idea. The only empirical evidence I could come up with, besides repeating the same argument again and again, is pictures of people shooting off their faces. The faceless corpse I brought was randexp
, and my argument was “not even the Random module itself is good at zero-safety, there is no hope for other user code”.
Am I supposed to bring other supporting evidence for my claim that including zero in the output is a bad idea?
What kind of evidence would be convincing to you?
More examples? Examples in “serious REAL PROGRAMMER” code? Does it get more serious than stdlib
?
It’s conceivable that their algorithm is slower than just calling -log()
but It does guarantee you get a valid float.
julia> function rexp()
u=rand()
while u <= 0.0 || u >= 1.0
u = rand()
end
return -log(u)
end
rexp (generic function with 1 method)
julia> using BenchmarkTools
julia> @btime rexp()
17.247 ns (0 allocations: 0 bytes)
0.26927999728116886
That’s for double floats… for 32 bit:
julia> function rexp32()
u=rand(Float32)
while u <= 0.0f0 || u >= 1.0f0
u = rand(Float32)
end
return -log(u)
end
rexp32 (generic function with 1 method)
julia> @btime rexp32()
16.092 ns (0 allocations: 0 bytes)
Yes 100% and I’m 100% with you that we should sample uniforms on (0,1)
I doubt very much that anyone has any code that relies on rand(Float32) returning 0.0f0 much less rand(Float64) returning 0.0
these are so unlikely that it’d be stupid to have your code break if it doesn’t happen. However, it’s not stupid to think that people’s code would break if 0.0 does happen as we’ve already seen.
wow that’s hard to parse. what I’m trying to say is that excluding 0.0 is the right thing to do and it’s only likely to FIX bugs not cause them.
Sure, but please also be aware that the link preview contains that same GPLd source code, and some folk may not appreciate looking at that inadvertently while scrolling through a random (pun not intended) thread on this discourse. You could place those links behind a spoiler tag for example, to remove that possibility.
If you like I could do that. It’d be nice to just be able to post a link without Discourse creating a preview. I certainly didn’t tell it to create that preview. Also, none of the actual visible code is relevant to the algorithm, the first link is just a shell function that calls the real work, the second link is just preview of comments.
done, I just created text links.
For those advocating sampling from (0, 1), please provide a proposed support set along with a justifiable uniform distribution criterion on that support set. Or put another way, how, in theory, are you rounding all the numbers in the mathematical set (0, 1) to the proposed support? So far the arguments have been “exclude it” with no further detail. Keep in mind that you want either P(rand() < p) = p or P(rand() ≤ p) = p for the very common use case of doing Bernoulli sampling and you want to be able to tell users to do one of those two things and get properly distributed Bernoulli variables.
what’s wrong with (where _rand() does the real work of something like [0,1] or [0,1) or etc )
function rand()
u = _rand()
while(u <= 0 || u >= 1)
u = _rand()
end
return u
end
We can think of this as sampling from a grid rand(UInt64)/2^64 and just not accepting anything small enough to round down to 0. Yes this is a bias. It’s a bias that’s acceptable and better than being able to generate singular results (such as Inf) in my opinion. The alternative, rounding to 0 also induces a bias for rand(Exponential()) as seen already which is a worse bias
That’s not how consensus works
I’m absolutely, most certainly against this idea, and further arguments about 0s are not going to convince me otherwise. I totally get the pragmatism but it fundamentally breaks the probabilities and meaning of rand.
What could convince me? If you could come up with a coherent explanation about what the values and their likelihood means. That’s what I’m missing. Or maybe if we are able to increase N
enough that such a problem is unobservable. But that might not be practical.
it breaks the probabilities and meaning of rand.
No it doesn’t. If you try to sample from rand() and get a value between 1e-9201 and 2e-9201 you will never get that value, yet probability tells you that this value must occur infinitely many times in an infinite sequence of samples. If you sample from mathematical interval [0,1] uniformly you will get NO samples exactly equal to 0.0 in an infinite sequence of samples.
Computer floats are only an approximation to real numbers. That they break down when very fine resolution is important is well known. Excluding 0.0 doesn’t damage the approximation to reals in any meaningful way (since sampling 0.0 is impossible mathematically and it’s already impossible with floats to get very very small floats). In some sense, rounding to 0 infinitely increases the probability of getting something impossible mathematically.
If you sample from mathematical interval [0,1] uniformly you will get NO samples exactly equal to 0.0 in an infinite sequence of samples.
in some sense, if you sample from actual dense interval, you will get NO samples exactly equal to any value
a value between 1e-9201 and 2e-9201
Our current rand()
would give you a zero because it falls in the bin [0, 2^-53). That’s what rand currently means. See Output distribution of rand(Float32) and rand(Float64), thread 2 - #147 by mbauman
Right, and my version would never give you anything, the smallest thing it’d give you for Float32 is 1.0f-45
which is to say that we damaged the distribution about as much as using Floats32 damages the distribution. The bias/error is on the order of the minimum possible value it could be.