Making the review process more pleasant for new contributors

We don’t (deliberately) merge PRs that cause new failures. Most current causes of failures probably passed CI when they were initially submitted, but intermittent issues or hardware changes or subtle interactions with other changes might cause them to (sometimes) fail; if you have enough flaky tests and failure of just one is enough to call the CI run a failure, then CI failures are common.

I’m not saying this is good, but chasing down the causes tends to be hard work and not that many people make this a core mission. If you want to find examples of PRs fixing flaky tests, a subset (probably only a minority) can be found with the “ci” label: Pull requests · JuliaLang/julia · GitHub

Sometimes, though, things get through due to inadequate test coverage and only discovered later.

While I’m not at all confident this is in any way meaningful, I amused myself by comparing Julia and Python:

  • Julia, source files ending in .jl under base/: ~130K LOC
  • Julia, files ending in .jl under test/: ~100K LOC
  • Python, source files as discovered by flspy=$(find . -name "*.py" ! -name "test_*"): ~400K LOC
  • Python, test files as discovered by flspy=$(find . -name "test_*.py"): ~470K LOC

So proportionately we seem to be in roughly the same ballpark, with ntestLOC ÷ ncodeLOC slightly lower for Julia than for python. I do think there’s an argument to be made that because Julia code may be more composable, we might need a higher ratio of tests-to-code than Python, and we don’t have that.

9 Likes