Nightly build CI failing because rand(UnitRange) changed (when to care?)

lmiq · November 10, 2020, 1:27pm

Yes, at the end I ended with something like that, except that the RNG value is not the global one, but one defined inside a function depending on some input parameters (seed is provided or not, reproducible run is desired or not).

not really, I am comparing with the output of the same calculation in a series of controlled runs, with realistic input. So, yes, I am doing the brittle option probably, but I would feel quite unsafe if the testing was done with toy problems with analytical solution, because there are many many issues than could arise in corner cases of real problems because the actual “shapes” I am integrating are too complicated. For now I think I will stick with this option.

Tamas_Papp · November 10, 2020, 1:30pm

Perhaps you misunderstand: the issue is not how you obtain the “true” solution (analytical, or MC runs) you compare to, but how you establish the error bounds for CI.

lmiq · November 10, 2020, 1:38pm

Ah, yes. Well, for the moment the default precision required by isapprox seems to be completely safe for the sequential version with the stable random number generator.

Testing the parallel version of the package is a separate issue, where those problems arise more seriously. I am yet to setup a safe testing routine for those runs (while my package has a parallel version which is working quite nice, I do not know yet how to run parallel tests in CI, but just didn’t have time to search for that yet).

Tamas_Papp · November 10, 2020, 2:06pm

So you get √eps relative precision from a stochastic calculation? That looks suspicious — even for IID draws, you would need a very, very large sample.

lmiq · November 10, 2020, 2:08pm

If the random number sequence is exactly the same, why not?

I mean, this is just a more complicated example of this:

julia> import Random

julia> Random.seed!(123);

julia> sum(rand() for i in 1:1000)
503.24660050142177

julia> Random.seed!(123);

julia> sum(rand() for i in 1:1000)
503.24660050142177

Tamas_Papp · November 10, 2020, 2:17pm

Sure, but what are you really testing then? That the same calculation produces the same result? Or is it coded in two different ways, just using the same random stream?

lmiq · November 10, 2020, 3:06pm

Generally speaking yes, it is coded different ways. The idea is to have a bunch of tests that assure that whenever I introduce modifications in the package (to improve performance, for example, or new features), I do not break what was working before. Is that different from testing in any other context? Of course some modifications can be breaking in terms of the tests because of these random number sequences, but many (and the most frequent ones) won’t be. So that reassures me that I have not introduced regressions whenever I fix a bug, add some feature, etc.

Edit: but I understand your point. This kind of test is not designed for a major algorithmic modification of the package, for sure, in which case the test should aim the comparison with an expected result with a reasonable precision. I do have some tests of this kind, but they are not part of the automatic test set, because they take too long to run for a safe precision threshold.

For example: if the the volume code above I decide to compute cutoff2 = cutoff^2 and not take the square root of the distance at every iteration of the loop. That saves ~5% of the time and the results are identical. It is good to have quick tests that assure me that I have not done anything wrong when small changes like that are introduced, and that everything continues to work as expected.

lmiq · November 10, 2020, 3:38pm

But since we are there, is there any limitation for CI testing? Can I add a test that takes half an hour to run?

Of course if that is possible I could add tests for which the actual result is compared to the expected precision that a user would expect from the results.

Tamas_Papp · November 11, 2020, 7:23am

Generally one would test some invariant that should hold given inputs and outputs.

Hardcoding input/output pairs is occasionally necessary though.

This depends on your CI setup — most frameworks allow you so set a longer timeout. Of course this will burn up any free tier very quickly.

For some of the economic models I am working on, a CI run takes 2–3 hours. But it is still great because actual estimation takes 1–2 weeks, so catching errors early is valuable. We ended up running CI on our own machine using Gitlab.

Topic		Replies	Views
Difference in Random numbers between 1.5 & 1.6 General Usage random	5	646	October 3, 2021
`rand((1,2))` vs `rand(1,2)`? General Usage question	6	1348	October 15, 2018
Differences in rand between v0.6 and v0.7 Internals & Design	2	1149	July 24, 2018
How to generate reproducible random numbers across versions via package manifest? General Usage random , reproducibility	27	2142	August 5, 2021
Why does Julia 1.6.5 and 1.7.1 generate differnent random values using the same seed? Optimization (Mathematical) question , probability	4	505	July 19, 2022

Nightly build CI failing because rand(UnitRange) changed (when to care?)

Related topics