For many years now, we have had buildbots connected to https://build.juilialang.org that build the actual Julia binaries that get hosted on the main website/nightlies. For most of this time, however, the state of a build on the buildbots has been somewhat opaque. Travis, Appveyor and Circle CI all report their status to github PRs, but changes to the Julia codebase would cause the buildbots to become nonfunctional for weeks before someone noticed that nightly binaries weren’t being uploaded anymore.
Part of this was because we have a very wide range of platforms that we target with our buildbots, and getting the necessary hardware support to handle the large volume of Julia builds took time. However, I am now pleased to announce that our CI/CD infrastructure is about to take a step forward.
The most visible change is that buildbot statuses are now being reported in PRs. At the moment, there are an awful lot of red X’s (some of those are actual problems in the Julia code base, others are because I’m still ironing out some of the issues in the new buildbot infrastructure) but these issues should get fixed in the next few days. Note that there are
package_xxx jobs, which represent the process of building Julia (e.g. running
make) and then there are
testing_xxx jobs, which represent the process of testing Julia (e.g. running
Base.runtests()). We break these up into two steps and run the tests on a different machine to attempt to catch problems where a Julia build is only functional on the machine it was built on. The testing jobs are triggered at the end of the
package_xxx runs, so they won’t even appear unless the packaging run is successful.
Clicking on the
details link next to one of the statuses will link directly to the buildbot page showing a build with all its steps. For those that belong to the
JuliaLang org on Github, you can restart a build by logging in via the link in the top right of the buildbot web interface, then clicking the blue
rebuild button in the top right.
The second big difference is that we now build (and host) two versions of Julia per-commit; an
assert build and a normal build. We do this firstly because we want CI to use assert builds to catch more bugs, but also secondly we would like to offer Julia package CI an easy way to use assert builds when testing your packages. We don’t have this hooked up to be used in Travis etc… easily yet, but that’s a natural progression once the new binaries are being built reliably.
Sorry if this sounds too negative, but do we really need to run all of these builds for each PR? That sounds like wasting a lot of energy, when having a daily build should be enough (e.g. with a message posted to Slack or a new issue opened on GitHub when something fails). Or at least it would be nice to run these only when merging, to avoid building things for every small change to PRs.
In short, I actually think we do. Every release cycle we end up doing a bunch of panic-work to try and un-break the various configurations that are not tested on each PR. The only way we will ever have consistently good support for the wide range of platforms that we purportedly support is if we make it as easy and consistent as possible to build for all those platforms.
I think the energy consumption here is not something we need to worry too much about. By integrating things like more BinaryBuilder-sourced tarballs, we can cut down on CI time significantly, so we shouldn’t be doing too much damage. If our buildbots can’t keep up with our pace of development, I consider that a personal challenge to continue to refine our buildbot infrastructure.
Besides, we’re saving the world already by getting people to move from MATLAB to Julia and decreasing their runtimes by factors of days, so we’re net carbon-positive.
That’s not so simple unfortunately. People may just use their computing power to estimate more complex models, or even buy new machines because models that could not be estimated before are now possible (see the Jevons paradox).
While global warming is an important problem, can we just focus on the Julia part? I feel that the former is a somewhat orthogonal question that we may not solve in this thread, while at the same time derailing the original (and also important) discussion.
Maybe (in some better future) it would make sense with a staging branch that ran CI for all architectures while only using a single architecture for CI on normal pushes to the PR branch.
With a staging branch, you can still run into significant problems. Let’s take the example scenario where someone notices that something broke
master. They look back through the history to find where it broke, then find the person who wrote that PR to help understand why it broke something on
armv7l, and with their help they submit a new PR. They won’t necessarily know if this new PR fixed things on
armv7l because it’s not being tested in their PR, but we’ll assume that they have a local
armv7l box. Unfortunately, their changes, while fixing something on
armv7l ended up breaking something on
ppc64le, which they do not have a local development box for, and then the whole cycle must repeat itself until it converges to a commit that lands on
master that does not break anything.
That scenario, while somewhat contrived, I hope illustrates the point that I’m trying to make; any platform combination that we don’t test on every PR is inevitably going to become second-class. It’s very easy to see this effect right now; the most unreliable architectures are armv7l and aarch64. In the ancient past, when we didn’t have windows CI showing up on every PR, the windows build was broken continuously. Before we had Iblis’ excellent FreeBSD CI showing up on every PR, the same held true for FreeBSD. The developer experience of things being tested on a branch somewhere is significantly worse than the developer experience of your PR being tested on all platforms immediately. Furthermore, we have the compute available to us, it’s not that many machines, especially considering how many machines we use with Travis, Appveyor and Circle CI.
I’m imagining something like bors — all PRs could just have one buildbot enabled, but a
bors r+ would run the whole gauntlet and only merge if everything worked as expected.
Edit to simply add this is really awesome work and it’s great to see all these platforms becoming much more first-class!
I must day that seeing 15 CI checks is pretty annoying. Is there anyway these could be bunched together, similar as how the Travis check bundles 3 builds.
I was initially annoyed by that as well, but I actually kind of like this system better now, as it immediately allows you to see what has or has not passed, without needing to click-through and see what failed. Additionally, since the
tester builds don’t show up unless the
package builds pass, it’s very easy to figure out which stage of your build failed at a quick glance.
That being said, it is a lot of checkmarks. Buildbot doesn’t have an easy way to combine them, but we can patch something in if it’s deemed checkmark overload.
I agree that it seems like too many results, especially since most of them are failing right now…
Let’s take the example scenario where someone notices that something broke
The idea was that this would be caught by CI on the staging branch so hopefully wouldn’t happen. If the test coverage was insufficient, it wouldn’t get caught and the fix wouldn’t be visible from the PR CI but I don’t think that would be a big deal as 1) the author would most likely test the specific fix locally, 2) it would be confirmed by the CI run on the staging branch before the fix is merged.
I think it’s really great that we now test more systematically on the less common platform and I agree that it’s necessary to test every PR but I still think we should try to reduce the number of low-value CI runs.
After some initial bumpiness, the buildbot CI state has mostly evened out. At this point, you should see green checkmarks for all major platforms (Windows, Linux, Mac, FreeBSD, on x86_64 and i686 where applicable) on the buildbots. Each job will run first a
package run, which shows whether or not Julia built and passed bootstrap properly. If that completes successfully, you should see a
tester run which runs the actual test suite (usually on a separate machine, to avoid “it works on my machine” syndrome).
Appveyor was recently fixed (thanks to Jameson) and Travis will soon be fixed once this PR is sorted out. At this point, Appveyor and Travis are duplicating the work of the buildbots, and we may be able to drop one or both eventually.