Package test (based on shuffle) ok locally and on GitHub CI but failing on PkgEval

In my package I have a partition function that randomly partitions a set of arrays into different shares, keeping the order across the various arrays.

I test it using on StableRNGs.jl and I have the test working both locally and on GitHub CI, but when I check the PkgEval report I have:

** Testing partition()...
Test Failed at /home/pkgeval/.julia/packages/BetaML/r7gn5/test/Utils_tests.jl:458
  Expression: out == [[31 1 51 61; 32 2 52 62], [11 41; 12 42]]
   Evaluated: AbstractMatrix{Int64}[[41 51 11 1; 42 52 12 2], [31 61; 32 62]] == [[31 1 51 61; 32 2 52 62], [11 41; 12 42]]

What could make it ending up with a different random stream ?

You call Random.shuffle who’s implementation changed (unify `shuffle` and `randperm` by ctarn · Pull Request #50318 · JuliaLang/julia · GitHub).

1 Like

Thanks… this is going to hurt a lot on my tests… by using StableRNG they did assume a certain output…

In general, tests like the one you are doing aren’t good because they aren’t really testing anything. Why is [[31 1 51 61; 32 2 52 62], [11 41; 12 42]] a good result? All this test is doing is making sure that the function runs without errors, and that no one has changed any of the implementation (rather than that the implementation remains correct). Instead, I would recommend just testing that the no error is thrown (and not checking the result), or testing the statistical properties of the result. To do this, you would (for example) generate 1000 partitions of a length 3 list and test that the number of times you get each partition is approximately the expected number.

2 Likes

Well, the point is that that specific output has some structural characteristics (e.g. picking up whole columns of the data - I am partitioning on the second dimension, getting a complete partition,…) so it is well more than just check it doesn’t error, and conditional to relying that the random part was deterministic, allowed me to summarize my checks on these characteristics (that I manually validated) with a simple equal. Without this guarantee instead I need to explicitly check each characteristic in the test code… sure it is more appropriate … but longer :wink:

It’s a little longer, but it also makes it a lot easier to improve things later on since you can change implementations without your tests breaking (which means that you won’t accidentally change your tests in a bad way). Specifically for partition, the natural things to test are that it returns a Vector{Vector{T}}, and that concatenating and sorting the result yields an identical result to the sorted original list. This test will be ~3 lines rather than 1 line, but it is actually testing a lot more.

3 Likes

This is a bug in StableRNG’s, as it is documented that shuffle is considered stable. We did not vendor the Random implementation when adding that guarantee, but need to do so now as the upstream implementation has changed. It is fixed in Add `shuffle` implementation to preserve stability by ericphanson · Pull Request #17 · JuliaRandom/StableRNGs.jl · GitHub, but needs a JuliaRandom maintainer to take a look.

1 Like