In my case the problem is not generating the data, is that the problem to be solved involves generating random coordinates of particles (since you are a physicist: to generate a reference state for a molecular system). The generation of these coordinates and the computation I have to do with them are quite costly, so I cannot rely on obtaining an exhaustive sampling to any reasonable precision during testing (the tests would take too much time). Therefore, I only generate a small sample and compare the rough results obtained with the expected result for that small sample. Of course, without being able to achieve any reasonable precision in that numerical integration with this small sample, I have to reproduce the exact same random coordinates to know that the package is doing always the same thing.

I was aware of that. That is why I was either waiting to choose what to do when 1.6 arrives, or searching for an alternative to write the tests.

Then I would have to modify the package to read that data, that data is not saved anywhere in an actual run, the coordinates are generated on-the-flight and discarded. (Indeed, those random coordinates are not input data).

If you don’t have a simple API to pass the sample coordinates in externally, I agree that it is trickier. You could write a new RNG that read its inputs from a file rather than generating it, but in that case it is probably easier to use something like the StableRNGs.jl package.

I was using simply rand() to generate random numbers, but to add the possibility of using optionally StableRNGs I have to use rand(rng). I changed the code to do that, but the problem is that now it allocates memory:

Thus, to have the option of passing a different rng to rand() for some reason it allocates memory even if the option is the default one (and, as I mentioned, I need to generate tenths of thousands of random numbers, so this is really an issue, the code was free from allocations without that).

Edit: Well if I declare rng as constant, that allocation goes away. I will see if I can adapt that to my case.

As @stevengj suggested above, it would be best to decouple the deterministic parts from the calculation from the random input generation, and save those “random” inputs and reuse them.

While this may involve some refactoring of the code, when feasible it is the best solution. StableRNGs.jl will guarantee the same random stream, but if your calculations involve floating point, they are not necessarily bit-by-bit reproducible across machines, CPUs, and Julia releases either.

But I don’t understand how from this you derive that it’s best to “save thoses random inputs”, rather than using an RNG to generate them in a reproducible way… or maybe it’s not what you meant?

In my case that option is not reasonable. The number of random numbers is too large, much larger than any of the input data I have to provide to the function. The best analogy is that of a Monte-Carlo algorithm for integration of a irregular volume in space, defined by the distance to a set of points. For instance, this code computes the volume of the region comprised within a cutoff distance of a set of points in space defined in data:

julia> function volume(data,cutoff,samples)
ns = 0
for i in 1:samples
x = -10 .+ 20*rand(3)
for j in 1:length(data)
d = 0.
for k in 1:3
d += (data[j][k]-x[k])^2
end
if sqrt(d) < cutoff
ns += 1
continue
end
end
end
volume = (ns/samples)*(20^3)
return volume
end
julia> data = [ rand(3) for i in 1:10 ];
julia> cutoff = 5.;
julia> samples=10^5;
julia> volume(data,cutoff,samples)
5280.32

How can I test this without a stable random number generator?

(edited to fix the code to make it work, please do not mind about the performance of the above, it is just an example)

What I am doing in the testing of my package is using a small number of samples, with a random number generator with predictable results, and comparing the result with a previously computed result which I know is correct. In an actual run of the package the number of samples is large and one can expect to obtain a good precision on that volume estimate, but for testing this takes too long.

By the way (and it is related to your other questions, I suppose), the usual strategy that I use in functions like this is

function volume(data, cutoff, samples; rng = Random.GLOBAL_RNG)
ns = 0
for i in 1:samples
x = -10 .+ 20*rand(rng, 3)
...

This way users can use your function without worrying about RNG, but you can test function by supplying RNG from custom package (yes, like StableRNGs.jl )

It really depends on the structure of those inputs — if there is a lot of them, then generating with a stable RNG may be the best option. Hard to say without context.

Thanks for providing an MWE — it is much easier to discuss concrete code. I would refactor as

function volume(data, cutoff, xs; K = length(first(xs)))
ns = 0
for x in xs
for j in 1:length(data)
d = 0.
for k in 1:K
d += (data[j][k]-x[k])^2
end
if sqrt(d) < cutoff
ns += 1
continue
end
end
end
volume = (ns/length(xs))*(20^3)
return volume
end

and save the xs. Even if their length is small, this should be enough to test that the code works, without aiming for accuracy — you simply reproduce the calculation some other way. Also, when (in the original code) you do

julia> volume(data,cutoff,samples)
5280.32

presumably you are comparing to a theoretically calculated value with some error bounds. Then you run into the classic statistical problem of making a trade-off between type I and type II errors. In practice, for very complex calculations and without tests tailored to the particular problem, this either means that you get false alarms in CI or tests that are not very meaningful.

Of course what people like to avoid with reproducible random numbers is the above, usually this means that they get a result, eyeball it, and then hardcode the bounds that make it pass into CI. But this is very brittle.

Yes, at the end I ended with something like that, except that the RNG value is not the global one, but one defined inside a function depending on some input parameters (seed is provided or not, reproducible run is desired or not).

not really, I am comparing with the output of the same calculation in a series of controlled runs, with realistic input. So, yes, I am doing the brittle option probably, but I would feel quite unsafe if the testing was done with toy problems with analytical solution, because there are many many issues than could arise in corner cases of real problems because the actual “shapes” I am integrating are too complicated. For now I think I will stick with this option.

Perhaps you misunderstand: the issue is not how you obtain the “true” solution (analytical, or MC runs) you compare to, but how you establish the error bounds for CI.

Ah, yes. Well, for the moment the default precision required by isapprox seems to be completely safe for the sequential version with the stable random number generator.

Testing the parallel version of the package is a separate issue, where those problems arise more seriously. I am yet to setup a safe testing routine for those runs (while my package has a parallel version which is working quite nice, I do not know yet how to run parallel tests in CI, but just didn’t have time to search for that yet).

So you get √eps relative precision from a stochastic calculation? That looks suspicious — even for IID draws, you would need a very, very large sample.