These specific issues aren’t too difficult to work around in the code, that’s true. But shouldn’t PkgEval catch them before the release is made?
Then either modify the change in Julia, or at least notify package authors.
Maybe I misunderstand the goals of PkgEval, and it’s not used to avoid package breakages in new Julia releases? Not sure, would be curious to hear about…
“Previous version” here means the “comparison” commit it’s testing against. If you look at the PkgEval report, this is a bit clearer. Here’s the one from yesterday: PkgEvalJob JuliaLang/julia@2ae0b7e [2024-10-06]. There are packages whose tests are flaky and packages whose tests are plain old broken and packages whose tests were already broken by something else and packages whose dependencies were broken.
Detangling all of the above takes significant time and energy. To put a finer point on it, PkgEval doesn’t catch it by itself. Someone reads the PkgEval report and uses it to identify what went wrong (if anything) and then works to address it. You can look at all the pull requests in building the release candidates to see how times PkgEval was run and examined. It’s hard, but it is being done.
Does it make sense to run PkgEval before the release or rc, comparing with the previous release – not with some nightly state? This event would be much more rare, enabling more effort (including community) to be put into reading and interpreting the report.
Note that the three linked issues (1st post) break tests of at least three different packages. By “break” I mean package or tests working on 1.10 and failing on 1.11.
So it’s not some single package with randomly failing tests or internals usage.
Then I’m thinking, is there code that expects this (in packages, mostly or only?) and are they then considered buggy? Either way they break in 1.11? Meaning it seems like 1.11 broke semver? What is the cause of this (I suspect related to Memory type added), will the packages just be fixed and this likely never happens again for this, or can we expect similar for vector or other things about misaligned expectations?
And even if the “should” is not to be expected, then I agree PkgEval should catch, ideally… but it relies on tests, and they might always be incomplete.
Whenever I read code A is equivalent to code B, I don’t assume A implements B or vice versa because it’s not possible for them to simultaneously implement each other in some equal relationship as “equivalent” implies. I just expect them to have the same results.
Yep, and you can see that both V0Tables and Accessors were flagged as potentially having problems — and were in those ~1000 packages that were manually rerun to try to detangle the above challenges.
This is also why RCs are announced and available. It’s all about trying to get more Swiss cheese slices on the stack.