Automation to ensure green CI on master

jar1 · August 8, 2023, 8:50pm

CI on master is sometimes failing, which makes it harder for new contributors to make PRs because they don’t have a good signal about success/failure. Some projects like Rust ensure that the master branch of rust-lang/rust is always in a valid state.

Are devs opposed to such a system or has nobody got around to setting up the automation or is there a budget problem?

Oscar_Smith · August 8, 2023, 8:58pm

The problem with this is that our test suite is slightly non-deterministic so there are times where someone merges a PR that was green by chance that then ends up having a bug that is detected in say 10% of CI runs which would then block a lot of other PRs from merging while we identify the PR that broke things and fix it.

Sukera · August 8, 2023, 9:00pm

How (if at all) does rust manage to evade spurious failures? Do we know what kinds of non-determinism in our testsuite often leads to these failures?

jar1 · August 8, 2023, 10:07pm

Is there a good way to identify tests with nondeterministic outcomes?

jling · August 8, 2023, 11:21pm

example of commits to main branch that failed CI

jar1 · August 8, 2023, 11:50pm

From talking with some rustaceans I think that’s not really in master. My link above shows Rust has a system of trying multiple commits at once to save time, and in this case one of the commits in a group failed, but the final “rollup” merge commit for that group was successful Auto merge of #114565 - matthiaskrgr:rollup-p7cjs3m, r=matthiaskrgr · rust-lang/rust@72c6b8d · GitHub

jar1 · August 9, 2023, 12:11am

I think a good philosophy is described in this issue from a (non-rust but very good) async library https://github.com/python-trio/trio/issues/200 , namely applying heavy effort to track down and eliminate each flaky test (and how to do it).

e3c6 · August 9, 2023, 12:31am

What aspect of the CI tests are non-deterministic?

Would it be worthwhile to make all tests deterministic?

Oscar_Smith · August 9, 2023, 2:12am

There are a number of non-deterministic aspects. Some are intentional (e.g. math tests using random numbers to increase code coverage over time), some are more inherent/annoying (e.g. file state, internet connection issues). The second set are definitely good to remove but the first possibly should stay.

viralbshah · August 9, 2023, 2:16am

One could imagine two sets of tests - with the fully deterministic set never allowed to fail, with stringent things like CI having to pass before merge is allowed.

woclass · August 9, 2023, 3:06am

There are two ways to divide those test sets:

split into two phases of execution
split into two separate CI tasks

If divided into two execution phases: simpler to implement, perhaps just modify test/runtests.jl.
The deterministic test is executed first, and if it fails, the whole test fails. After passing the deterministic test, continue executing the other tests.
If the CI platform does not support returning multiple states, you may need to manually check the test results of the latter phase.

If you choose to split tests: the number of tests to be run doubles, but you can get a better view of the status of each type of test run.

e3c6 · August 9, 2023, 1:25pm

Not sure what you mean?

e3c6 · August 9, 2023, 1:31pm

What is rust stance on non-deterministic tests? A very superficial search seems to indicate they don’t have them but I couldn’t find any reliable source.

sbuercklin · August 9, 2023, 1:47pm

(e.g. math tests using random numbers to increase code coverage over time)

How does using an RNG which can lead to random failures improve test coverage if the first solution to failures is to run the test suite again and hope the RNG plays nicely this time around?

Oscar_Smith · August 9, 2023, 1:58pm

theoretically at least we only rerun tests after looking to see what failed and if it’s potentially real

rfourquet · August 9, 2023, 2:00pm

Hopefully in this case the preferred solution is to use the failure as a bug report and try to fix it. This class of failure is generally easy to reproduce by setting the RNG seed.

sbuercklin · August 9, 2023, 2:14pm

Is there a list of tests which are RNG-unstable?

gbaraldi · August 9, 2023, 2:21pm

The rng tests aren’t the ones that cause failures, by far the most common failure is network tests.

jules · August 9, 2023, 2:27pm

Would it help if all network tests had their own ci job so one could immediately see that failures are unrelated, and one could run only that one again manually?

Krastanov · August 9, 2023, 2:43pm

This discussion is great and I am learning a lot from it. Some of the questions and suggestions are quite pertinent/insightful. I would urge the folks spearheading this questioning to document what is being learnt and to even make pull requests with their suggestions (even if the pull request is draft and incomplete) in order to keep the ball rolling – otherwise we will just have a long thread that becomes outdated. The core devs have not done this not because they do not agree it is valuable but because their TODO lists of valuable work is 10x longer than what they have the time to do.

Topic		Replies	Views
Noisy integration tests (appveyor, travis) Internals & Design	6	858	April 25, 2019
Unit tests with random seed: local vs Travis General Usage question	14	1481	July 26, 2017
Nightly build CI failing because rand(UnitRange) changed (when to care?) New to Julia	28	1336	November 11, 2020
Unit tests for packages that use random number generation that are robust to version changes General Usage	4	610	September 3, 2018
Failing builds on GitHub Internals & Design	2	339	June 6, 2022

Automation to ensure green CI on master

Related topics