Unit testing that depends on random number generation

I have a package, DependentBootstrap, where the unit tests rely explicitly on values generated by randn. Currently, I set the seed before each test to ensure each run is identical. Of course, this runs into problems whenever the randn function itself changes, as it will for the jump to Julia v1.5.

Are there any best practices for dealing with problems like this? I suppose I could allow for keyword arguments that allow the user to explicitly specify the random numbers to be used in core functions, but this feels clunky. Perhaps the best solution is simply to update my tests every time the randn (and related) functions change in Base Julia?

Cheers,

Colin

  1. Separate random and deterministic parts of your code into small functions. You can test deterministic parts without randomness, or with random inputs but checking for some relevant condition/invariant that should still hold.

  2. For the truly deterministic parts, devise statistical tests that have a false positive (ie fail when the code is correct) with a very low probability (eg on the order of p=10^{-4}) but also have a low false negative rate. When the tests fail, emit a diagnostic trail in the logs that allows you to replicate and inspect the problem. You can still set a random seed to minimize human intervention, then you can go with a higher p and revisit code only when the RNG of Julia changes.

Testing for specific “random” outcomes is usually illusory: it mostly means that someone ran the code and it gave some result, which then was hardcoded into the tests.

4 Likes

The reproducibility section of the Random manual has some suggestions for tests:

Software tests that rely on specific “random” data should also generally save the data or embed it into the test code. On the other hand, tests that should pass for most random data (e.g. testing A \ (A*x) ≈ x for a random matrix A = randn(n,n) ) can use an RNG with a fixed seed to ensure that simply running the test many times does not encounter a failure due to very improbable data (e.g. an extremely ill-conditioned matrix).

6 Likes

Thanks for responding.

I’ll have a look at implementing both of your suggestions, although I would have thought a much more conservative p would be more appropriate, i.e. something like p = 10^{-10}. This can be done in my use-case albeit by increasing the time taken to perform the test (since I’ll need to simulate a large number of observations).

To be honest, this is exactly what I did. I just simultaneously performed the computation by hand as well to ensure the outcome was definitely correct. But I do understand that the lack of transparency to the end user makes this approach not ideal.

Thanks for responding, I had not seen this section of the manual before. To be honest, I have considered this approach in the past, but resisted it because it will make the code a lot less clean. This is especially true for the routine for the stationary bootstrap, where you don’t even know exactly how much random number generation you need until you are partway through the function (the block size itself is random). Perhaps I can split out the generation of each block into its own function, although again this will make the code less intuitive to any random statistician who wants to have a look through it to make sure it is doing what they think it should do.

I did not look at the details, but I am wondering if you can do the same in the unit tests: given a stream of random numbers (eg setting the seed), do the computation in an alternative way in the test, then compare the results.

1 Like

Could anyone point me to the details of that change regarding the random implementation in Julia 1.5?

Here is the relevant PR

1 Like

I think I understand. You mean coding up a “by-hand” variant of the function that is laboriously and obviously correct for specific input data, and uses the same calls for random number generation, then set the seed, call the by-hand variant, record the value, reset the seed, call the actual package routine, record the value, compare the values, and if they’re different then throw an error.

I quite like this. It would be robust to changes in rand since both variants call the same random numbers in the same order.

I think I’ll do it this way. Thanks for the idea.

1 Like