Upgrading JuliaLang/Julia buildbot CI

staticfloat · March 19, 2019, 10:05pm

For many years now, we have had buildbots connected to https://build.juilialang.org that build the actual Julia binaries that get hosted on the main website/nightlies. For most of this time, however, the state of a build on the buildbots has been somewhat opaque. Travis, Appveyor and Circle CI all report their status to github PRs, but changes to the Julia codebase would cause the buildbots to become nonfunctional for weeks before someone noticed that nightly binaries weren’t being uploaded anymore.

Part of this was because we have a very wide range of platforms that we target with our buildbots, and getting the necessary hardware support to handle the large volume of Julia builds took time. However, I am now pleased to announce that our CI/CD infrastructure is about to take a step forward.

The most visible change is that buildbot statuses are now being reported in PRs. At the moment, there are an awful lot of red X’s (some of those are actual problems in the Julia code base, others are because I’m still ironing out some of the issues in the new buildbot infrastructure) but these issues should get fixed in the next few days. Note that there are package_xxx jobs, which represent the process of building Julia (e.g. running make) and then there are testing_xxx jobs, which represent the process of testing Julia (e.g. running Base.runtests()). We break these up into two steps and run the tests on a different machine to attempt to catch problems where a Julia build is only functional on the machine it was built on. The testing jobs are triggered at the end of the package_xxx runs, so they won’t even appear unless the packaging run is successful.

Clicking on the details link next to one of the statuses will link directly to the buildbot page showing a build with all its steps. For those that belong to the JuliaLang org on Github, you can restart a build by logging in via the link in the top right of the buildbot web interface, then clicking the blue rebuild button in the top right.

The second big difference is that we now build (and host) two versions of Julia per-commit; an assert build and a normal build. We do this firstly because we want CI to use assert builds to catch more bugs, but also secondly we would like to offer Julia package CI an easy way to use assert builds when testing your packages. We don’t have this hooked up to be used in Travis etc… easily yet, but that’s a natural progression once the new binaries are being built reliably.

nalimilan · March 20, 2019, 8:46am

Sorry if this sounds too negative, but do we really need to run all of these builds for each PR? That sounds like wasting a lot of energy, when having a daily build should be enough (e.g. with a message posted to Slack or a new issue opened on GitHub when something fails). Or at least it would be nice to run these only when merging, to avoid building things for every small change to PRs.

staticfloat · March 21, 2019, 7:39am

In short, I actually think we do. Every release cycle we end up doing a bunch of panic-work to try and un-break the various configurations that are not tested on each PR. The only way we will ever have consistently good support for the wide range of platforms that we purportedly support is if we make it as easy and consistent as possible to build for all those platforms.

I think the energy consumption here is not something we need to worry too much about. By integrating things like more BinaryBuilder-sourced tarballs, we can cut down on CI time significantly, so we shouldn’t be doing too much damage. If our buildbots can’t keep up with our pace of development, I consider that a personal challenge to continue to refine our buildbot infrastructure.

Besides, we’re saving the world already by getting people to move from MATLAB to Julia and decreasing their runtimes by factors of days, so we’re net carbon-positive.

nalimilan · March 21, 2019, 10:14am

That’s not so simple unfortunately. People may just use their computing power to estimate more complex models, or even buy new machines because models that could not be estimated before are now possible (see the Jevons paradox).

Tamas_Papp · March 21, 2019, 11:05am

While global warming is an important problem, can we just focus on the Julia part? I feel that the former is a somewhat orthogonal question that we may not solve in this thread, while at the same time derailing the original (and also important) discussion.

andreasnoack · March 21, 2019, 11:47am

Maybe (in some better future) it would make sense with a staging branch that ran CI for all architectures while only using a single architecture for CI on normal pushes to the PR branch.

staticfloat · March 21, 2019, 6:44pm

With a staging branch, you can still run into significant problems. Let’s take the example scenario where someone notices that something broke armv7l on master. They look back through the history to find where it broke, then find the person who wrote that PR to help understand why it broke something on armv7l, and with their help they submit a new PR. They won’t necessarily know if this new PR fixed things on armv7l because it’s not being tested in their PR, but we’ll assume that they have a local armv7l box. Unfortunately, their changes, while fixing something on armv7l ended up breaking something on ppc64le, which they do not have a local development box for, and then the whole cycle must repeat itself until it converges to a commit that lands on master that does not break anything.

That scenario, while somewhat contrived, I hope illustrates the point that I’m trying to make; any platform combination that we don’t test on every PR is inevitably going to become second-class. It’s very easy to see this effect right now; the most unreliable architectures are armv7l and aarch64. In the ancient past, when we didn’t have windows CI showing up on every PR, the windows build was broken continuously. Before we had Iblis’ excellent FreeBSD CI showing up on every PR, the same held true for FreeBSD. The developer experience of things being tested on a branch somewhere is significantly worse than the developer experience of your PR being tested on all platforms immediately. Furthermore, we have the compute available to us, it’s not that many machines, especially considering how many machines we use with Travis, Appveyor and Circle CI.

mbauman · March 21, 2019, 6:56pm

I’m imagining something like bors — all PRs could just have one buildbot enabled, but a bors r+ would run the whole gauntlet and only merge if everything worked as expected.

Edit to simply add this is really awesome work and it’s great to see all these platforms becoming much more first-class!

kristoffer.carlsson · March 21, 2019, 6:56pm

I must day that seeing 15 CI checks is pretty annoying. Is there anyway these could be bunched together, similar as how the Travis check bundles 3 builds.

staticfloat · March 21, 2019, 7:07pm

I was initially annoyed by that as well, but I actually kind of like this system better now, as it immediately allows you to see what has or has not passed, without needing to click-through and see what failed. Additionally, since the tester builds don’t show up unless the package builds pass, it’s very easy to figure out which stage of your build failed at a quick glance.

That being said, it is a lot of checkmarks. Buildbot doesn’t have an easy way to combine them, but we can patch something in if it’s deemed checkmark overload.

StefanKarpinski · March 21, 2019, 8:11pm

I agree that it seems like too many results, especially since most of them are failing right now…

andreasnoack · March 22, 2019, 7:55am

Let’s take the example scenario where someone notices that something broke armv7l on master .

The idea was that this would be caught by CI on the staging branch so hopefully wouldn’t happen. If the test coverage was insufficient, it wouldn’t get caught and the fix wouldn’t be visible from the PR CI but I don’t think that would be a big deal as 1) the author would most likely test the specific fix locally, 2) it would be confirmed by the CI run on the staging branch before the fix is merged.

I think it’s really great that we now test more systematically on the less common platform and I agree that it’s necessary to test every PR but I still think we should try to reduce the number of low-value CI runs.

staticfloat · April 3, 2019, 11:22pm

After some initial bumpiness, the buildbot CI state has mostly evened out. At this point, you should see green checkmarks for all major platforms (Windows, Linux, Mac, FreeBSD, on x86_64 and i686 where applicable) on the buildbots. Each job will run first a package run, which shows whether or not Julia built and passed bootstrap properly. If that completes successfully, you should see a tester run which runs the actual test suite (usually on a separate machine, to avoid “it works on my machine” syndrome).

Appveyor was recently fixed (thanks to Jameson) and Travis will soon be fixed once this PR is sorted out. At this point, Appveyor and Travis are duplicating the work of the buildbots, and we may be able to drop one or both eventually.

Topic		Replies	Views
Julia CI checks - buildkite - Internals & Design question	4	354	October 26, 2023
Why so many libraries on github say that their builds are failing? General Usage	1	319	March 1, 2024
Noisy integration tests (appveyor, travis) Internals & Design	6	847	April 25, 2019
Built Julia by platform/arch Internals & Design build	4	712	September 25, 2019
JuliaGPU CI moving to Buildkite GPU ci	0	616	November 10, 2020

Upgrading JuliaLang/Julia buildbot CI

Related topics