Unit tests for packages that use random number generation that are robust to version changes

Hi all,

I just got caught out by the change in rand and MersenneTwister between v0.6 and v0.7 (my own fault - I didn’t read the release notes carefully enough).

However, this raised a deeper issue for me. If one has a package that is inherently dependent on random number generation (in my case this is my DependentBootstrap package), what is the best way to write unit tests to go in runtests.jl?

Until now, I used a call to srand(1234) (Random.seed!(1234) in v0.7) to get consistent runs with known results for testing purposes, but this is what caught me out in the upgrade. Is there a smarter way to deal with this? Or is the best solution simply to use seed! and to read the release notes carefully and upgrade your unit-tests if the behaviour of random number generation changes between upgrades?



The latter is best - as far as I know, RNG is not guaranteed to give the same results between versions. It would be impossible to fix bugs affecting it. Always make sure though that there are changes to RNG announced when your expected result changes, before fixing your unit test - otherwise, a different result may be a bug.

I did this when updating DynamicHMC.jl.

In the ideal case, I construct my stochastic tests to have minimal (say 10^{-4} or similar) probability of Type I errors, yet still retain power, but sometimes there are trade-offs. I think that using a fixed random seed and going through the tests every time the RNG changes has benefits, I actually caught a few misspecified tests that way.

Also, keep in mind that even if you set the RNG seed, results may be subtly different over long calculations, with different CPUs, compilation and optimization settings.

Finally, I think that @testset resets the random seed after it is done for the global RNG, so it is very useful.

Thanks for responding. Based on what you and Tamas have said, I think I’ll keep using RNG and just be a bit more careful in the future.

Thanks for responding.

That’s quite a neat idea to construct the tests such that you can say with high certainty what interval the answer will lie on. I’ll have a think about it, although it may prove more work than using RNG and just being careful. Point taken about different CPU’s.