The paper, with analysis and a recommended approach (Julia is not alone)
Drawing random floating-point numbers from an interval
Did you read the paper? Is it mainly a problem with how Distributions.jl implement Uniform
, or does it also affect the standard rand()
over [0, 1)?
I read the paper twice and ran the code.
… so much care is taken by languages and their libraries to ensure that a float can be drawn uniformly at random from the interval [0, 1) without ever returning 1 to fail at ensuring the same for other intervals … none of [these] languages … implements correctly the drawing of a float in an interval [a, b).
it is possible to get a reasonable approximation of spatial equidistributivity, uniformity, flexibility in the choice of the bounds, and respect of said
bounds at an affordable price. In addition, since the γ-section algorithm can be viewed as a generalization of the method often used to draw floats in [0, 1), it becomes possible to have only one procedure to draw floats from any interval.
Sounds interesting. So this doesn’t affect Base rand()
, then, I gather?
Will you open an issue in Distributions.jl, then?
I haven’t checked how it’s implemented now (the paper used 1.6.1 since its from 2021), but from what I read this also affects Base (though differently):
3.8. Julia
The Random standard package draws floats uniformly at random from [0, 1) by first generating a float in [1, 2) and then subtracting 1.0. As a consequence, S0 1 = {k21−p | 0 k < 2p−1 − 1}.
They sadly don’t link to the implementation directly, so hunting this down requires some more sophisticated tools than my smartphone
I guess the really interesting part would be how it works for ranges of floats exactly?
here is Base.rand reaching the presumably open upper bound
v = Base.rand(-2.0f0:1.0f-3:1.0f0, 9900);
julia> minimum(v)
-2.0f0
julia> maximum(v)
1.0f0
Still the same way I gather
That’s just the generation of a single float in [0,1) though, not picking from an arbitrary range.
I dont understand what this demonstrates: -2.0f0:1.0f-3:1.0f0
is a vector with 3001 elements that you sample 9000 times, so it’s likely that its extrema will be picked up; this vector doesn’t represent an interval of floats.
The assumption is that the upper bound on the interval does not belong to the set of valid sample points.
I think it’s actually very much expected for ranges to be inclusive on both ends, that’s how all ranges in julia work. That’s also not what the paper was about in the first place.
What I’m personally unclear about is whether all elements of that range are equally likely to be drawn (which I think it is, meaning the issue is specific to Distributions.jl instead of Base).
you may be right – the paper does speak about reaching the open end of an interval when that is not to occur – however if primitive rand is using a clopen interval and derivative rands (given ranges) are using an clclosed interval well, it fooled me.
I never thought that the concept of an open range of discrete values made a lot of sense. It’s a concept that is really just needed for continuous intervals. A range m:n
should always be inclusive, otherwise you might as well write m:n-1
.
And rand(1:3)
should naturally be just like rand([1,2,3])
, or rand([3, 1, 2])
.
If the sampling sets of rand
where always considered clopen, what would happen to rand(Bool)
?
My use of rand is mostly with floating point values. rand(Bool, n) is like rand((:red, :green, :blue), n) and sure there are types that are set-like and one would want to behave as random selection from k values. rand(Float64) is selecting 1 of the enumerable IEEE754 finite values, however that is many, many, many more than I conceptualize as a discrete small collection – and they are binary rational numbers too. There are good reasons to prefer clopen bounds on well-ordered sequences of many, many values. When working with time intervals, some aspects of temporal logic get tangled if the intervals are clclosed. So, if I am simulating some of that, I want clopen float ranges (and if I am investigating timepoint relations, clopen int ranges).
It should not be one size fits all, unless that is necessary.
Yes, indeed. Float64
is technically discrete, but they are modelling the continuous Reals, so it’s perfectly fine to talk about [0.0, 1.0)
as a half-open interval, and I think that rand()
should sample from that half-open interval.
But this was in reference to the example you showed:
The range -2.0f0:1.0f-3:1.0f0
is a collection of discrete numbers, and I see no reason to treat it as a half-open interval and exclude its endpoint in rand
.
I can specify rand(minfloat:floatstep:prevfloat(maxfloat)) to force clopen,
Do we know that it cannot return maxfloat for any values used (where minfloat <= maxfloat)?
It is, barring bugs in the implementation. The rand
docstring says
Pick a random element or array of random elements from the set of values specified by S
Which I think implies uniformity, otherwise it could be clarified.
If you can prove that minfloat:floatstep:prevfloat(maxfloat)
doesn’t contain maxfloat
(which definitely seems to be the case), then we do know that calling rand
on it can’t return maxfloat
.
The analysis in the paper is specific to the use of floating point numbers
representing ranges of real numbers as in [0,1)
with the implications
of IEEE floating point rounding for the results and not in the sense of
a range of integers or other set of values (even if the set of values are
the set of floating point numbers considered as a set and not a sort of
realization of the real numbers).
Those correspond to the the sense of drawing random values from a set of
elements which would include all elements in the set, even if the elements
are specified by a minimum value, maximum value, and step.
Concerning rand(rng, Float64)
the size of the set of the potential outputs is 2^53
or 2^52
depending on the RNG we are using:
rand(r::Union{TaskLocalRNG, Xoshiro}, ::SamplerTrivial{CloseOpen01_64}) =
Float64(rand(r, UInt64) >>> 11) * 0x1.0p-53
rand(r::AbstractRNG, ::SamplerTrivial{CloseOpen01_64}) = rand(r, CloseOpen12()) - 1.0
The second definition is used for MersenneTwister
.
Both approaches deliver equidistant samples n * eps(0.5)
and n * eps(1.0)
for n = 0, 1, 2, ...
.
Nevertheless it is an open question, if equidistancy is a necessary requirement. See for example this link: uniformly distributed floating point numbers