Why do packages run continuous integration tests on Julia nightly?

I’m wondering why is it a standard practice to run continuous integration tests for packages on Julia nightly? It seems like there isn’t useful information to be gained from this, as oftentimes jobs fail (and seem expected to fail) due to upstream instabilities that are unrelated to the package.

It seems like on certain days many Julia packages have the GitHub :x: next to the latest commit. From a branding perspective I just feel like it sort of makes the Julia ecosystem look poorly maintained if so many important packages have the Screenshot 2023-07-05 at 7.54.42 AM on first glance (depending on the day – today’s build looks good), even though upon closer inspection you can see the stable tests are all passing.

I guess maybe this is a GitHub issue rather than issue of Julia’s standards. However, GitHub will not support explicitly allowed failures in calculating a build status Please support something like "allow-failure" for a given job · Issue #2347 · actions/runner · GitHub, so maybe it’s better to adjust on our side and either avoid testing on nightly, or configure CI tests nightly in a different way?

6 Likes

It allows, for example, to detect future problems and give more time to adapt. For example, libcurl (that has > 1900 dependents) will cause lots of failures in packages, and dependencies, that did not adapt their compat to new major version (went from 7.xx to 8.xx). See example in Merge pull request #1208 from GenericMappingTools/interfacing1 · GenericMappingTools/GMT.jl@e41b287 · GitHub

This happens only on nightly.

3 Likes

I’m beginning to think we should mostly stop running package CI against nightly. Not just because of appearances, but because if package devs are pro-actively adapting to Julia changes, it might hide unintended breakages of backwards compatibility from PkgEval. PkgEval runs start in earnest after the feature freeze, so much of the time they are not being run.

There will be exceptions, of course: packages used to develop Julia (Cthulhu, Revise, Debugger, JET, etc) need to test against nightly. But I have the growing sense that it’s a mistake to have most packages do so.

21 Likes

PkgEval runs more often than that but I kind of agree with the point overall.

I have modified the actions configuration of my packages to no longer run tests against nightly. When a new version is nearing release, I run tests locally to make sure things work.

1 Like

Many (most?) dependents have not adapted (change their compat version to include 8.0.1) to Julia 1.10. There will be a quite short time after the nearing release.

It certainly does hide breakages, actually lots of packages would be broken almost every release if not for these adaptations.
I asked some time ago, and got the feeling that breaking packages is considered fine if they use “internals” — even if the majority of package do (transitively).

If desired, PkgEval could easily test without this adaptation. Just checkout the General registry version as of some time ago, for example when the previous Julia version got released.

Could we have a version tag for whatever is released next? So right now that would point to 1.9.2-rc (I know that such a thing does not exist). Having a regular CI run against this tag seems like it would facilitate detecting any bugs early while not requiring nightly.

1 Like

I think packages that are blacklisted on PkgEval should probably also run CI against nightly, to avoid surprises on release day.

1 Like

I have discovered bugs in Julia master multiple times (example 1, example 2, example 3, example 4) by running tests on nightly. No one had reported them by looking at PkgEval. Example 4 was hanging also on PkgEval but no one seemed to care.

4 Likes

What we do in DataFrames.jl (and plan to as I think it is a good practice):

  1. Run tests against Julia nightly to catch issues in Julia Base (we detect them indeed, this is not just a theoretical reason)
  2. Set up CI not to report failure on Julia nightly as failure (i.e. it is run, we see the result, but it is not considered :x:)
7 Likes

I have discovered bugs in Julia master multiple times

But isn’t it mostly a question of timing? As long as the bugs are fixed by the time of release, isn’t that OK? I think the PkgEval runs are not looked at as carefully until after feature freeze, but if you can wait that long then wouldn’t it all work out in the end?

Example 4 was hanging also on PkgEval but no one seemed to care.

Test timeouts seemed like a different issue, just the duration of precompilation? It was known that precompilation time went up.

I asked some time ago, and got the feeling that breaking packages is considered fine if they use “internals” — even if the majority of package do (transitively).

I’ve been thinking about that conversation when I made my recommendation. Yes, if the packages are using internals then it’s really their fault. I guess my thinking, though, is that having the core Julia devs “exposed” to the consequences seems like a good thing. Either they can say “you know, maybe we don’t really need to delete that method” or they can file an issue with the package saying “bad, bad, stop depending on internals” and maybe people will be motivated to reduce their dependence on internals. It just seems to provide more opportunities for conversations about long-term stability, which seems like it can’t be a bad thing.

3 Likes

I don’t know, I believe that proactively reporting errors early is better than waiting for them to be looked at months later. I’d also like to be able to use packages on master. Additionally, bugs can keep accumulating and hiding each other or getting entangled if not fixed timely, making the debugging process needlessly harder.

According to the log precompilation took 6 seconds, that hardly explains a timeout. And the last output of the tests was precisely at the same point where tests were timing out after 6 hours on github actions, which was the entire issue.

7 Likes

I think this is a good compromise. If you think about the target audience of CI results, there are two groups:

  1. Package maintainers, who want to know whether their code works. They are interested in testing against nightly to (a) know about any breakages or upcoming deprecations, and (b) contribute bug reports to Base.
  2. Package users, who just glance at the build status to see if a package is being actively maintained. If the CI results are split into groups, they can see whether it’s working for their target OS or software version.

I think “hiding” the nightly test results from group 2 is a good idea. i.e., a nightly action that does not run on each commit, but rather in a regular cron job – so it does not affect GitHub’s commit status.

@tim.holy also makes an insightful point too about why you would avoid nightly tests – or at least not proactively update your code for them. Not sure how this point should be communicated to package maintainers though. Maybe this could just be a warning on julia-action/setup-julia for version: nightly that a package maintainer should not aggressively update their code to get it working on nightly?

1 Like

I think “care” is the wrong word here. I think “notice” is more accurate. The PkgEval logs are extremely noisy and it is almost impossible to exhaustively get all the Julia issues from them.

1 Like

I couldn’t agree more and I have in the past voiced criticism against the average package testing versus nightly because it is climate unfriendly, because in the majority of cases (exceptions apply) the typical developer will not “care” or “notice” or “have time to” find the useful failures (correct positives) in the sea of unuseful failures (false positives) when it comes to the nightly tests. Hence, in my eyes, it is wasted energy to run 1 additional test suite in every commit for the average package.

6 Likes

Sorry but I don’t get this point. A package developer cannot avoid to see if his package(s) fails on nightly only. And for sure he/she will go see why.

I repeat that there are about ~1900 packages waiting to find that they don’t run on 1.10dev because of the libcurl issue I mentioned above. And that because quite likely most of them are not running theirs CI againts nightly.

As a developer, I can explain this point: fixing CI failures is a time consuming process. It takes time to find go through the tests, it takes time to find the processes that fails, it takes time to find what the bug is, and whether it comes from your own package or not, and finally it takes time to actually fix the bug by altering code.

Seeing a lot of failures that do not come from your own package leads to exactly the same situation as the boy who cried wolf: after several expendatures of my time to realize that all these failures are actually unrelated to my own package or the Julia code it runs (because sometime nightly runs won’t even built, this also counts as a failure), I simply started ignoring the nightly runs more and more. I know that other developers have done the same thing.

As such, I have concluded that since I won’t be using the nightly runs, and that it is unlikely that my very much front-end packages will be the ones that find the bugs in core Julia instead of Julia’s own test suite, running nightly tests is a waste of energy. Other packages will have different judgement on the topic, especially those using more and more “internal” or “advanced” features of the language.

1 Like

That is all true and I agree, but in the example I keep insisting on the failures are caused the the own package and only package authors (or 3rd party PRs) can fix them. And problems like this are not found if nightly builds are not used. True that it is not necessary to run the CI against nightly all the time, but one every some other time is an useful thing.

The tricky part comes when your package unwittingly ends up relying on one of those packages, whether that be directly or as a test dep. If that can be controlled for I agree, but if not there still is value in testing against some kind of pre-release version.

My question there is whether there’s a way to test against the latest alpha/beta/rc iff. one newer than the current stable release exists. We don’t want to waste CI time, but if there’s no way to do this with the current Julia GHA infrastructure then people are likely to continue testing against nightly.

1 Like