Unit testing that depends on random number generation

colintbowers · April 25, 2020, 10:38am

I have a package, DependentBootstrap, where the unit tests rely explicitly on values generated by randn. Currently, I set the seed before each test to ensure each run is identical. Of course, this runs into problems whenever the randn function itself changes, as it will for the jump to Julia v1.5.

Are there any best practices for dealing with problems like this? I suppose I could allow for keyword arguments that allow the user to explicitly specify the random numbers to be used in core functions, but this feels clunky. Perhaps the best solution is simply to update my tests every time the randn (and related) functions change in Base Julia?

Cheers,

Colin

Tamas_Papp · April 25, 2020, 11:26am

Separate random and deterministic parts of your code into small functions. You can test deterministic parts without randomness, or with random inputs but checking for some relevant condition/invariant that should still hold.
For the truly deterministic parts, devise statistical tests that have a false positive (ie fail when the code is correct) with a very low probability (eg on the order of p=10^{-4}) but also have a low false negative rate. When the tests fail, emit a diagnostic trail in the logs that allows you to replicate and inspect the problem. You can still set a random seed to minimize human intervention, then you can go with a higher p and revisit code only when the RNG of Julia changes.

Testing for specific “random” outcomes is usually illusory: it mostly means that someone ran the code and it gave some result, which then was hardcoded into the tests.

stevengj · April 25, 2020, 3:08pm

The reproducibility section of the Random manual has some suggestions for tests:

Software tests that rely on specific “random” data should also generally save the data or embed it into the test code. On the other hand, tests that should pass for most random data (e.g. testing A \ (A*x) ≈ x for a random matrix A = randn(n,n) ) can use an RNG with a fixed seed to ensure that simply running the test many times does not encounter a failure due to very improbable data (e.g. an extremely ill-conditioned matrix).

colintbowers · April 27, 2020, 2:30am

Thanks for responding.

I’ll have a look at implementing both of your suggestions, although I would have thought a much more conservative p would be more appropriate, i.e. something like p = 10^{-10}. This can be done in my use-case albeit by increasing the time taken to perform the test (since I’ll need to simulate a large number of observations).

To be honest, this is exactly what I did. I just simultaneously performed the computation by hand as well to ensure the outcome was definitely correct. But I do understand that the lack of transparency to the end user makes this approach not ideal.

colintbowers · April 27, 2020, 2:34am

Thanks for responding, I had not seen this section of the manual before. To be honest, I have considered this approach in the past, but resisted it because it will make the code a lot less clean. This is especially true for the routine for the stationary bootstrap, where you don’t even know exactly how much random number generation you need until you are partway through the function (the block size itself is random). Perhaps I can split out the generation of each block into its own function, although again this will make the code less intuitive to any random statistician who wants to have a look through it to make sure it is doing what they think it should do.

Tamas_Papp · April 27, 2020, 6:17am

I did not look at the details, but I am wondering if you can do the same in the unit tests: given a stream of random numbers (eg setting the seed), do the computation in an alternative way in the test, then compare the results.

tamasgal · April 27, 2020, 6:27am

Could anyone point me to the details of that change regarding the random implementation in Julia 1.5?

colintbowers · April 27, 2020, 6:30am

Here is the relevant PR

colintbowers · April 27, 2020, 6:38am

I think I understand. You mean coding up a “by-hand” variant of the function that is laboriously and obviously correct for specific input data, and uses the same calls for random number generation, then set the seed, call the by-hand variant, record the value, reset the seed, call the actual package routine, record the value, compare the values, and if they’re different then throw an error.

I quite like this. It would be robust to changes in rand since both variants call the same random numbers in the same order.

I think I’ll do it this way. Thanks for the idea.

Topic		Replies	Views
Unit tests for packages that use random number generation that are robust to version changes General Usage	4	598	September 3, 2018
Nightly build CI failing because rand(UnitRange) changed (when to care?) New to Julia	28	1305	November 11, 2020
Randomness and reproducible results / unit tests General Usage reproducibility	5	490	March 11, 2021
Unit tests with random seed: local vs Travis General Usage question	14	1441	July 26, 2017
How to get new random values at every run of a testset? General Usage	1	449	June 17, 2019

Unit testing that depends on random number generation

Related topics