Sure, but my implementation was just meant to be indicative of the type of thing to be done. instead of literally calling rand(Float64) we could call rand(UInt64) and work with that, something like:
i = rand(UInt64)
exp = -10
if i < 1 <<32
exp -=32
else
i = i >> 32
end
r = ldexp(Float32(i),exp)
The bigger point is my last two paragraphs â itâs hard to preserve such an open-open interval through any transformation, even trivial fully linear monotonic ones. And Iâd say itâs impossible to give general guidance on how to do so.
Half-open intervals are much easier to preserve and I think itâs possible to describe a general algorithm to do so for a limited set of well-behaved functions f.
The use of transformations that take numbers near 0 to large numbers is common though (specifically log of a probability is a really common transform such as when calculating entropy), and the main thing there is to avoid rounding up to Inf. I mean, Inf is a lot bigger than for example
Inf is not a problem if you know what it means. With our current rand(), getting -Inf from log(rand()) means you may have drawn a value in \big(-36.7368005696771, -\text{Inf}\big]. With an âidealâ [0, 1) rand, thatâd be \big(-744.4400719213812, -\text{Inf}\big].
If we have an ideal RNG you should never get Inf ever ever ever ever ever no matter how big your cluster. (Or rather, the real cause of it would be hardware failure or a bug) Right now you get it in less than a minute on a desktop. Thatâs the real concern.
Thatâs been patched. Itâs no longer a concern. Once it hits release, you wonât get values bigger than about 44 with the current rand(Float64). If we went all the way down into subnormal Float64s we could go as high as about 752.
If someone goes around randexp to write -log(rand(T)) and rand is documented to return 0.0 and that user doesnât want +Inf, itâs bad usage on the userâs part. We canât protect everyone from every floating point pitfall that exists.
I mean maybe, but as a guy who might want to calculate a log of a probability coming from a âperfect Float32â generator I donât want it to ever return Inf because that would be observing something so unlikely that itâs vastly vastly more likely my computer will catch fire and kill me in my sleep by many many orders of magnitude. Inf is not something I should ever get. Itâs vastly more likely that a correct implementation of rand(Float32) would return 84.3 because of bit flip errors.
Calculating the log of a probability is like one of the most common things you could do, itâs not just for generating exponentials itâs inherent in entropy calculations or in working with posterior distributions of Bayesian models. Admittedly you donât necessarily calculate it by this method but my point is more about the inherent aspect of doing general nonlinear transforms with singularity near 0.
Youâre losing the forest in the trees. What should the probability that -log(rand(Float64)) == -log(0.5) be? It canât be the statistically correct answer 0.
Yeah, the float generation code assumes that it has access to a kind of random oracle producing uniform random bits (rather an infinite read-only extra tape filled with random, when viewed through the lens of turing machines).
Most considerations should not attempt to peek into the internals and instead be studied in such an oracle model.
You cannot make any choice. Once you have decided that cdf(p) = p, this already implies that probability(rand() == p) = cdf(nextfloat(p)) - cdf(p) = nextfloat(p) - p. Luckily this coincides with the choice you wanted to make anyways!
Correct for Float16. As I repeatedly said, I refuse to have any opinions on Float16.
For BFloat16, Float32 and Float64 we can either redistribute the probability mass to e.g. the smallest non-subnormal number, or we could do println("Your RNG is ill-seeded!"); exit(1);, since this will only happen with a probability < ldexp(1.0, -120), which I consider negligible, and which is also considered negligible by the united states government with respect to the encryption of state secrets.
This isnât the kind of question that concerns people who do floating point computations, but how often should -sum(log.([rand(Float32) for i in 1:1000000])) return Inf is the kind of thing you might care about. If the rand() implementation is âperfectâ this will never happen in my lifetime or indeed before the heat death of the universe so Iâm ok with it.
Let me give a more meaningful applied example⌠Suppose you have a potential energy function with a singularity near 0 and someone generates millions of random particle locations on [0,1) and hands them to your potential energy function and then wants to calculate ensemble sums or averages ⌠You wind up with this kind of calculation except itâs a user defined weird function instead of log
Once again this thread has exploded because weâre argumenting past eachother on descriptive vs. normative behaviors. Iâm trying to describe what it truly means to perform log(rand()) with the status quo rand() and how you can actually interpret its output with the status quo⌠and then projecting that to see how it might apply if we change rand() to output more values. Youâre arguing about what its output should mean â but I donât think this is a sensible thing to do because itâs not a symbolic computation and you do need to be aware of particulars. I get it â I would love to assume that any transformation of rand() is exactly equivalent to sampling from the transformed uniform distribution.
The very same arguments apply to perfect_open_open_rand() + 1. Theoretically, this should never return 2, but even with a theoretically ideal (0, 1) rand, it will with great regularity!
My arguments are that we need to be able to meaningfully and concretely describe why both these things happen beyond a hand-wavy âfloating points arenât accurateâ mumbo jumbo.
Maybe I never made it clear, but Iâm 100% OK with your cdf(p) = p at the generated quantities criterion and including 0.0f0 when applied to any rand(Float32) implementation that makes 0.0f0 have probability low enough that a million core cluster canât observe it in at least a week of computing but preferably closer to a year. Itâs possible in Float32 to make this probability be unobservable in the lifetime of the universe. But we might not have to go that far.
This criteria makes it irrelevant for almost everyone in the universe except people with tremendous resources who can take care of the question themselves.
I also think it would be good to put explicitly in the docs for rand(Float32) that "note that this generator can generate 0.0f0 with probability about X and if thatâs a problem we suggest wrapping it with something like
randno0(Float32) = begin f = rand(Float32); f = (f == 0.0f0) ? nextfloat(0.0f0) : f; end
which is the smallest perturbation possible to the cdf affecting only the one possible value 0.0f0.
what I donât think is ideal is if a physics grad student has to deal with running his particle simulation for a weekend and itâs virtually guaranteed to give Inf as the ensemble average because thereâs about a 1e-9 probability to get exactly 0.0f in a calculation where 0.0 makes no sense such as a potential energy calculation and multiple billions of random numbers will be calculated (this is a weekend project for grad students these days).
Hereâs a very rough concept of a SIMD-able full-representation (i.e., with values approaching floatmin(T) or nextfloat(zero(T))) implementation. Even if the concept is valid, THIS PROBABLY CONTAINS SOME ERROR. In particular, it probably doesnât handle the bottom scale correctly. It also wastes some entropy, only observing the bottom Base.significand_bits(T)+1 bits of each generated UIntXX, but this might be faster anyway due to the resulting simplicity. The chances of re-drawing are SIMDWIDTH * 2^-(Base.significand_bits(T)+1), so even with 512-bit registers this is 2^{-6} for Float16 and 2^{-20} for Float32. Float16 should probably use something slightly different anyway, since at most 0 or 1 redraws are required given UInt16s (depending on whether we emit subnormals).
It does appear to SIMD except for the generation of rand(UIntXX). But we already have a solution for that somewhereâŚ
block_randf(t::Type{T},n::Val) where {T} = block_randf(Random.default_rng(),t,n)
function block_randf(rng::Random.AbstractRNG, ::Type{T}, N::Val) where {T<:Base.IEEEFloat}
U = Base.uinttype(T) # same-width Unsigned
maxint = U(maxintfloat(T)) # largest representable integer T
scaleincr = inv(maxintfloat(T)) # if we try to return a zero, drop by this factor and try again
smallesteps = eps(zero(T)) # minimum spacing between values that we want to achieve
scale = one(scaleincr) # initialize
scalepow = Int(-log2(scaleincr))
todrop = Int(-log2(smallesteps) + log2(scale))
numscales = fld(todrop, scalepow) # how many times we can apply scaleincr while above smallestreturn
maxint_bottom = U(exp2(todrop - numscales*scalepow)) # at bottom scale, this number may be different
f = ntuple(Returns(zero(T)), N) # the values we plan to return
good = ntuple(Returns(false), N) # track which lanes are valid
for _ in 1:numscales
scale *= scaleincr # prepare to evaluate this scale
u = ntuple(_->rand(rng, U), N) # TODO: need a SIMD version of this
u = u .% maxint # mask to representable values in T
fnew = T.(u) .* scale # lossless conversion to float at appropriate scale
f = ifelse.(good, f, fnew) # replace bad values from previous iterations
good = good .| .!iszero.(u) # mark values as good or bad
all(good) && return f # all values are valid
end
# handle bottom scale
u = ntuple(_->rand(rng, U), N) # TODO: need a SIMD version of this
u = u .% maxint_bottom # mask to representable values in T
fnew = T.(u) .* smallesteps # lossless conversion to float at appropriate scale
f = ifelse.(good, f, fnew) # replace bad values from previous iterations
return f
end
EDIT: improved handling of bottom range, although this still misses many representable values in the lower ranges. See following post by mbauman.
I think something like this could be performant enough for general use. Itâs probably even competitive (average case) to an emulated-rounding same-width int->float conversion scheme like what I presented above.
Thatâs clever â and a promising direction â but I donât think itâs quite doing what we want because it returns values from the Set([...; 2^-48:2^-48:2^-24-2^-48; 2^-24:2^-24:1-2^-24]) for Float32. Itâs not âfully saturatingâ the floating point values around the beginning of each of the ranges.
Itâll also eventually round down its scale to 0 (with Float16s, anyhow)⌠so perhaps the while loop should just be a for loop over the maximum possible iterations, which would allow it to theoretically return 0s and terminate (which might also make the compiler happier).
But the key insight for me is that we actually can (of course) SIMD with the possibility of redraws and we can structure the probabilities such that itâs rare (even with 4x or 8x at a time).
The same concept should apply to the multiplicative UIntXX->FloatXX version you already posted, and we can only choose to re-sample when we dive below the fully-sampled regions of the floating point number. I think.
Indeed. I was worried about this and should have given it more thought, but Iâm a touch fuzzy in the mind today. Youâre right that It definitely has sparse support and some more thought is necessary to get that right.
To your point, it appears the missing piece here is probably resampling small values where the support becomes incomplete. My mind isnât up to the task today (I already got it wrong once), but someone else can give it a crack. Iâm a tad concerned that weâll only be extracting roughly an exponent_bits amount of dynamic range per step. This would mean a more substantial re-draw frequency, but perhaps still small enough to carry a low cost. And maybe we get more dynamic range than Iâm thinking.
I was already working out the finite-number-of-loops thing (I updated the above post) so that I could handle the bottom scale properly. Iâm not sure the old scheme got the 0 frequency correct, but hopefully now itâs better on that front.
Yeah, thatâs why my mind immediately jumped to the full-bit-width versions, because then you get a dense sampling of all values greater than 2^{(23-32)} = 0.001953125 and 2^{(52-64)} = 0.000244140625. So even at 8x generation, less than a 1% chance of a re-draw per block. In other words, this shouldnât be a meaningful hit to the performance numbers I already posted.