Package test (based on shuffle) ok locally and on GitHub CI but failing on PkgEval

sylvaticus · December 8, 2023, 3:27pm

In my package I have a partition function that randomly partitions a set of arrays into different shares, keeping the order across the various arrays.

I test it using on StableRNGs.jl and I have the test working both locally and on GitHub CI, but when I check the PkgEval report I have:

** Testing partition()...
Test Failed at /home/pkgeval/.julia/packages/BetaML/r7gn5/test/Utils_tests.jl:458
  Expression: out == [[31 1 51 61; 32 2 52 62], [11 41; 12 42]]
   Evaluated: AbstractMatrix{Int64}[[41 51 11 1; 42 52 12 2], [31 61; 32 62]] == [[31 1 51 61; 32 2 52 62], [11 41; 12 42]]

What could make it ending up with a different random stream ?

Oscar_Smith · December 8, 2023, 3:41pm

You call Random.shuffle who’s implementation changed (unify `shuffle` and `randperm` by ctarn · Pull Request #50318 · JuliaLang/julia · GitHub).

sylvaticus · December 8, 2023, 4:01pm

Thanks… this is going to hurt a lot on my tests… by using StableRNG they did assume a certain output…

Oscar_Smith · December 8, 2023, 4:10pm

In general, tests like the one you are doing aren’t good because they aren’t really testing anything. Why is [[31 1 51 61; 32 2 52 62], [11 41; 12 42]] a good result? All this test is doing is making sure that the function runs without errors, and that no one has changed any of the implementation (rather than that the implementation remains correct). Instead, I would recommend just testing that the no error is thrown (and not checking the result), or testing the statistical properties of the result. To do this, you would (for example) generate 1000 partitions of a length 3 list and test that the number of times you get each partition is approximately the expected number.

sylvaticus · December 8, 2023, 4:33pm

Well, the point is that that specific output has some structural characteristics (e.g. picking up whole columns of the data - I am partitioning on the second dimension, getting a complete partition,…) so it is well more than just check it doesn’t error, and conditional to relying that the random part was deterministic, allowed me to summarize my checks on these characteristics (that I manually validated) with a simple equal. Without this guarantee instead I need to explicitly check each characteristic in the test code… sure it is more appropriate … but longer

Oscar_Smith · December 8, 2023, 4:45pm

It’s a little longer, but it also makes it a lot easier to improve things later on since you can change implementations without your tests breaking (which means that you won’t accidentally change your tests in a bad way). Specifically for partition, the natural things to test are that it returns a Vector{Vector{T}}, and that concatenating and sorting the result yields an identical result to the sorted original list. This test will be ~3 lines rather than 1 line, but it is actually testing a lot more.

ericphanson · December 8, 2023, 11:35pm

This is a bug in StableRNG’s, as it is documented that shuffle is considered stable. We did not vendor the Random implementation when adding that guarantee, but need to do so now as the upstream implementation has changed. It is fixed in Add `shuffle` implementation to preserve stability by ericphanson · Pull Request #17 · JuliaRandom/StableRNGs.jl · GitHub, but needs a JuliaRandom maintainer to take a look.

Topic		Replies	Views
Nightly build CI failing because rand(UnitRange) changed (when to care?) New to Julia	28	1307	November 11, 2020
A stable RNG for tests: request for feed-back Package Announcements	24	2082	August 22, 2021
Julia 1.11.1 gives different results from 1.10.5 General Usage	56	2198	December 7, 2024
Random numbers, with MersenneTwister, not stable across major releases General Usage random	8	232	November 7, 2024
Why is `sort(x, by = _ -> rand())` not a good shuffler? General Usage	2	355	February 19, 2022

Package test (based on shuffle) ok locally and on GitHub CI but failing on PkgEval

Related topics