PkgEval success rate dropping - a cause for concern?

jishnub · March 10, 2024, 12:12pm

Looking at https://raw.githubusercontent.com/JuliaCI/NanosoldierReports/gh-pages/pkgeval_charts/daily.png, it appears that the PkgEval success rate has dropped substantially in the 1.11-dev period (and into the current 1.12 one). I’ve not looked at the numbers, but a nearly 15% success rate drop between the beginning of the 1.9-dev period and the present sounds a little worrying. It seems the number of packages that successfully pass CI has remained almost constant over the 1.11-dev period, so are the failing packages not being updated anymore?

mkitti · March 10, 2024, 12:36pm

Julia 1.11 introduces some significant changes. It should be Julia’s responsibility to maintain backwards compatibility overall though, right?

cormullion · March 10, 2024, 12:52pm

I think something about IOBuffer has changed - I’m guessing that this might have uncovered/introduced some issues … ?

MilesCranmer · March 10, 2024, 8:41pm

It’s interesting how there’s so many more evals resulting in kill (presumably timeouts?) now than there used to be. I wonder why that is.

jishnub · March 14, 2024, 5:54am

Indeed, and tests for widely used packages such as StaticArrays are being killed

abraemer · March 14, 2024, 6:40am

Someone posted the graph with the package test time some time ago, where you also can see a drastic increase in test times. Perhaps that causes a lot of timeouts?

The graph of package test time:

Comparison with daily pkgevals (posted by @jishnub already)

First of all we see two steep increases in test time during 1.11dev. The first one correlates with the infrastructure change and the second one I don’t know. Looking at the number for PkgEval during 1.11dev, it seems to me, that the first increase in test time does come with a reduced number of successful evals. The number of skipped packages increased by ~3 percentage points and there are maybe a bit more crashes (hard to tell but the red line of failed evals separates a bit from the dark reddish line for crashed evals) and maybe some more killed evals, but the biggest change is definitively in the number of skipped packages.

It looks like the second major increase in test time came also with quite a reduction in successful evals but this time the numbers for kill increase the most. So this might indeed by timeout related. With the subsequent reduction in test time the kill numbers go down as well but I think they stay a bit larger than before? Really hard to tell. Test time also does not quite reach the previous level so that would make sense. However it seems like the success number doesn’t recover to where it should be.

So a theory based on this analysis could be: During 1.11 we had a phase that drastically increased runtime causes much more timeouts. After fixing the runtime issue not all packages went back to successful running. Since they where in timeout before, one didn’t notice that there maybe was a change that reduced success rates in packages.

Disclaimer: I don’t really know what these numbers mean in practice. Someone with a deeper understanding of PkgEvals should judge whether this analysis makes sense.

jishnub · March 14, 2024, 6:56am

The spike just before Nov 2023 was when Memory was introduced, and was used as a backend for Array

maleadt · March 14, 2024, 7:14am

There’s been a couple of significant changes recently. From the top of my head:

Memory{T} and moving out stdlibs has caused additional compilation, see e.g. 2x regression in testing time of small package · Issue #53511 · JuliaLang/julia · GitHub and v1.11 Takes longer to compile, generates 50% larger cache files and loads slower · Issue #53570 · JuliaLang/julia · GitHub. This results in more packages getting killed due to timeout. I assume that these are bugs that will get fixed during the 1.11 release polishing.
Several internal APIs that packages rely on have been changed, e.g., https://github.com/JuliaLang/julia/pull/52233, https://github.com/JuliaLang/julia/pull/53219 (which is not part of 1.11, but is one of the large drops on the chart that looks as if it’s part of 1.11), etc. There’s no stability promised here, so packages will have to be fixed. Because this affects certain important packages like GPUCompiler and Zygote, the fallout is significant.

Topic		Replies	Views
Questions about testing and PkgEval, e.g. why only 63% of ecosystem tested, rest "skipped"? General Usage	1	395	May 1, 2023
Regressions in Julia not getting caught by PkgEval General Usage pkgeval	8	227	October 9, 2024
ANN: Nanosoldier package evaluation -- with badges! Tooling announcement , pkgeval	6	1772	February 7, 2020
JuliaHub is discrediting packages Package Management pkgeval	8	901	April 19, 2022
Julia 1.11 beta high latency Internals & Design	25	2836	September 17, 2024

PkgEval success rate dropping - a cause for concern?

Related topics