All I’m trying to say is that if we can provide a mathematical proof for example that a certain process produces |cdf(x) - x| < eps(x) and eps(x) is small enough everywhere, we should accept that as a uniform random generator. Basically I think that’s a good criterion for deciding that we have a good generator. I personally think a good eps for the whole space is probably about 2^-40 or better. I also personally think that if there are a few quirks in there that are on the order 2^-30 it’d be acceptable, especially if it reduces the error in other places (like for example debiasing would be OK with me even if it produced some kink somewhere provided it’s an understood and documented kink and doesn’t affect practical simulations KS tests after an hour on one core for example)
I’m just saying I’m planning to set up a Quarto document that can assess this question empirically as well and be provided as a document other people can run and produce simulation evidence that the simulation WORKS the way we say it does. This provides both assurance against bugs and assurance that people can do certain kinds of computations with singularities that they can verify without having to write the verification code. I don’t recommend it as continuous integration test.
I think it would be an interesting question as to what happens if we propose algorithm A which is fast, generates say on a grid of N=32 and has a sufficiently simple algorithmic description that it can be proven to have say 1.5*2^(-32) as an upper bound for the error in the cdf everywhere on [0,1), and then in 5 hours of computation there’s another RNG say B which is even slightly faster for the rand! case for example, has no such easy manual verification procedure, but after the 5 hour computation the KS test rejects A and doesn’t reject B as a uniform RNG in the left tail… It’ll become a question at least worth documenting and so that’s what I’m doing, providing a way to document the tradeoffs.
To make it clear, the reason we should care about the left tail independently of the whole range is precisely because of the way floats work almost all the floats in (0,1) are near 0, half of them are below ldexp(1.0f0,-64) or so.
The relevance of the smallest values becomes less important as probabilities at 2^-110 are unobservable on computers in our universe, but 2^-40 is not, it only takes around 0.3 hours on a single core, so it’s a coffee break.
In any case, even if we choose a generator that fails some of these tests, it could be fine but perhaps after all this discussion there’s a DetailedRandom.jl package you can use if you’re interested in increasingly detailed numbers, and having a testing document that can compare them could be valuable for those people.