PkgEval success rate dropping - a cause for concern?

Looking at https://raw.githubusercontent.com/JuliaCI/NanosoldierReports/gh-pages/pkgeval_charts/daily.png, it appears that the PkgEval success rate has dropped substantially in the 1.11-dev period (and into the current 1.12 one). I’ve not looked at the numbers, but a nearly 15% success rate drop between the beginning of the 1.9-dev period and the present sounds a little worrying. It seems the number of packages that successfully pass CI has remained almost constant over the 1.11-dev period, so are the failing packages not being updated anymore?

1 Like

Julia 1.11 introduces some significant changes. It should be Julia’s responsibility to maintain backwards compatibility overall though, right?

4 Likes

I think something about IOBuffer has changed - I’m guessing that this might have uncovered/introduced some issues … ?

It’s interesting how there’s so many more evals resulting in kill (presumably timeouts?) now than there used to be. I wonder why that is.

1 Like

Indeed, and tests for widely used packages such as StaticArrays are being killed

Someone posted the graph with the package test time some time ago, where you also can see a drastic increase in test times. Perhaps that causes a lot of timeouts?

The graph of package test time:

Comparison with daily pkgevals (posted by @jishnub already)

First of all we see two steep increases in test time during 1.11dev. The first one correlates with the infrastructure change and the second one I don’t know. Looking at the number for PkgEval during 1.11dev, it seems to me, that the first increase in test time does come with a reduced number of successful evals. The number of skipped packages increased by ~3 percentage points and there are maybe a bit more crashes (hard to tell but the red line of failed evals separates a bit from the dark reddish line for crashed evals) and maybe some more killed evals, but the biggest change is definitively in the number of skipped packages.

It looks like the second major increase in test time came also with quite a reduction in successful evals but this time the numbers for kill increase the most. So this might indeed by timeout related. With the subsequent reduction in test time the kill numbers go down as well but I think they stay a bit larger than before? Really hard to tell. Test time also does not quite reach the previous level so that would make sense. However it seems like the success number doesn’t recover to where it should be.

So a theory based on this analysis could be: During 1.11 we had a phase that drastically increased runtime causes much more timeouts. After fixing the runtime issue not all packages went back to successful running. Since they where in timeout before, one didn’t notice that there maybe was a change that reduced success rates in packages.

Disclaimer: I don’t really know what these numbers mean in practice. Someone with a deeper understanding of PkgEvals should judge whether this analysis makes sense.

1 Like

The spike just before Nov 2023 was when Memory was introduced, and was used as a backend for Array

1 Like

There’s been a couple of significant changes recently. From the top of my head:

1 Like