Regressions in Julia not getting caught by PkgEval

As happened before as well, the 1.11 Julia release has some regressions that break existing code – and not just any code, but registered packages.
Some examples of what I personally encountered today when testing my packages (that succeed on 1.10) on the new release: Type of `@doc x` changes when REPL is loaded · Issue #54664 · JuliaLang/julia · GitHub (reported before), 1.11 regression: source file does not contain a module declaration · Issue #56050 · JuliaLang/julia · GitHub, in 1.11, `push!` doesn't forward to `append!` as it should · Issue #56051 · JuliaLang/julia · GitHub.

These specific issues aren’t too difficult to work around in the code, that’s true. But shouldn’t PkgEval catch them before the release is made?
Then either modify the change in Julia, or at least notify package authors.

Maybe I misunderstand the goals of PkgEval, and it’s not used to avoid package breakages in new Julia releases? Not sure, would be curious to hear about…

2 Likes

Here’s a report that did find that module declaration issue, https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2024-09/14/VOTables.primary.log with VOTables.jl. I guess once it’s in the " 2999 packages failed tests on the previous version too." category it gets skipped over…

Actually this report is on Julia Version 1.12.0-DEV.1205 but I can’t find any reports on earlier versions.

1 Like

Interesting… Tests do pass for me (locally + CI) on 1.10.

“Previous version” here means the “comparison” commit it’s testing against. If you look at the PkgEval report, this is a bit clearer. Here’s the one from yesterday: PkgEvalJob JuliaLang/julia@2ae0b7e [2024-10-06]. There are packages whose tests are flaky and packages whose tests are plain old broken and packages whose tests were already broken by something else and packages whose dependencies were broken.

Detangling all of the above takes significant time and energy. To put a finer point on it, PkgEval doesn’t catch it by itself. Someone reads the PkgEval report and uses it to identify what went wrong (if anything) and then works to address it. You can look at all the pull requests in building the release candidates to see how times PkgEval was run and examined. It’s hard, but it is being done.

5 Likes

I see, thank you both for the explanations!

Does it make sense to run PkgEval before the release or rc, comparing with the previous release – not with some nightly state? This event would be much more rare, enabling more effort (including community) to be put into reading and interpreting the report.

Note that the three linked issues (1st post) break tests of at least three different packages. By “break” I mean package or tests working on 1.10 and failing on 1.11.
So it’s not some single package with randomly failing tests or internals usage.

3 Likes

About this interesting case:

And disagreement on that it “should”: in 1.11, `push!` doesn't forward to `append!` as it should · Issue #56051 · JuliaLang/julia · GitHub

Then I’m thinking, is there code that expects this (in packages, mostly or only?) and are they then considered buggy? Either way they break in 1.11? Meaning it seems like 1.11 broke semver? What is the cause of this (I suspect related to Memory type added), will the packages just be fixed and this likely never happens again for this, or can we expect similar for vector or other things about misaligned expectations?

And even if the “should” is not to be expected, then I agree PkgEval should catch, ideally… but it relies on tests, and they might always be incomplete.

Whenever I read code A is equivalent to code B, I don’t assume A implements B or vice versa because it’s not possible for them to simultaneously implement each other in some equal relationship as “equivalent” implies. I just expect them to have the same results.

2 Likes

We often do so during the release process, see e.g. Backports for 1.11.0-alpha2 by KristofferC · Pull Request #53543 · JuliaLang/julia · GitHub. Subsequent runs that do not specify vs=":release-1.10" will compare against the merge-base, which is typically the previous alpha/beta/release candidate, and not the nightly.

2 Likes

Yep, and you can see that both V0Tables and Accessors were flagged as potentially having problems — and were in those ~1000 packages that were manually rerun to try to detangle the above challenges.

This is also why RCs are announced and available. It’s all about trying to get more Swiss cheese slices on the stack.

2 Likes