The basic answer for the delay is that we need to build all the binaries first, then update the website (and we don’t especially care about a small delay). I would probably use the official website as a source of truth. If a user doesn’t get the newest version for 24 hours, that’s not a big deal in my opinion.
People need to chill out. There are actual humans doing a lot of work to get these releases out and update everything and it takes them time to do things. If there’s some change to the order of publishing things that would make your life easier by not breaking tools, that’s all well and good and we can try to do things in that order, but it’s just unreasonable to expect this stuff to happen instantaneously.
As Oscar alluded to, there are a number of steps in the process. The process starts with the tag and ends with the announcement, but there’s plenty of stuff that happens in between (including that the people who work on these things also have day jobs ). This one actually came together quicker than usual.
The process is:
VERSION gets updated on the release branch via PR
A tag is made based on the release branch once that’s merged
The buildbot infrastructure generates and GPG signs tarballs
Download those and create a GitHub release based on the tag with the tarballs as release artifacts
Submit the tag SHA to the buildbots to build binaries for most platforms
Someone has to manually turn on the musl buildbot, job gets submitted there
For Julia 1.7+, someone has to manually build the macOS M1 binary
The macOS binaries are submitted to Apple for notarization
All of the binaries get put into the right places with the right names and permissions in AWS S3
The binaries are downloaded locally to compute the checksums, then the checksums are uploaded to S3
A GitHub Actions CI run is submitted to regenerate the JSON file of versions
The website is updated via PR
An announcement is made on Discourse
Depending on the current size of the buildbot queue, step 5 in particular can take a pretty long time. Aside from human bandwidth, that’s often the largest limiting factor. None of the others is exactly instantaneous either though.
Regarding the JSON file, as far as I know, the claim that it’s updated every 24 hours is outdated; I believe it is now only updated manually. (Is that right, @SaschaMann?) If I recall, we did that because there were cases where an automated run happened before a particular component was ready, e.g. between when the raw macOS dmg made it into S3 and when a notarized version of it was uploaded, which would then record the wrong checksum in the JSON file. Something like that, anyway.
When I do the release announcements on Discourse, if the CI run that regenerates that JSON file hasn’t completed by the time the post is ready, I link to the in-progress CI log. That’s the one you may occasionally see linked to “soon” when I say “and (soon) GitHub Actions” when talking about the availability of the released version on CI providers.
Perhaps not the answer you’re hoping for, but from my perspective as one who does a non-insignificant number of the aforementioned steps of the process, a release is really only ready for general use once it’s announced on Discourse. That’s the only way to guarantee that the process as laid out has been completed, that binaries are ready for use, and that users who want to grab the new version and go will have an experience that we feel good about.
It seems like for automation purposes the JSON should be definitive and people writing tooling should just ignore what happens before that. If the steps were done in the listed order then the JSON file should have been updated before the announcement was made. However, the last-modified header for the versions.json file is more recent than the discourse post by about half an hour:
The announcement post was made at 02:31 UTC (not 14:31)
The issue was opened at 02:39 UTC, eight minutes later
The last-modified header for versions.json is 02:57:56 UTC.
@ararslan, is there some latency to updating the versions.json file or were things just done slightly out of order? Not withstanding, I do think everyone can relax about the 27 minutes that the versions file was out of date.
Regarding tone, yes, it’s great that people are excited and want to get v1.7 right away, but it’s really a drag that the first thing that happens after a release is made is people complaining about things like this. Guess I could just stop paying attention.
is there some latency to updating the versions.json file or were things just done slightly out of order?
Yeah, there’s a fair bit of latency there since the script that generates it takes a long time to run. That’s the CI log I was referring to here:
I start it going before announcing and if it hasn’t finished by the time I announce, I just link to the log in the announcement so that people can see whether it’s done at their leisure. If it’s less confusing for people, I can just wait until it finishes. That’d up the 7 hours a bit though!
A bit tangential but there are likely ways to speed up the creation of the JSON file. For example, it currently regenerates the entire thing from scratch every time, but realistically it should be able to take some prior source of truth and add to it. If the checksum for the Linux tarball for Julia v0.3.0-rc1 changes mysteriously, I think we have bigger problems than the JSON file having the wrong checksum.
Indeed. There were also some issues where some binaries were available but not all which causes CI builds to fail until the file is updated again. CI scripts are often configured to use the latest 1.x version that’s available so they’d try to load binaries that aren’t in the file or on the download servers at that point.
Also, scheduled workflows are deactivated after 6 (?) months with no activity in the repo, so one would still have to manually check each time it’s released. I think the current way to trigger the build is as automated as realistically possible with GitHub Actions.
We could probably check the Last-Modified header and compare it to the Last-Modified value of the JSON file before downloading the binary and regenerating the checksum. I haven’t measured it but I assume downloading is the bottleneck, so if we can avoid it for most binaries, that should speed things up quite a bit. For anyone who wants to take a look, the relevant code lives here. That way we would still be able to automatically regenerate checksums if that ever becomes necessary.
@ararslan Is this writeup a good candidate for the dev docs? Having a canonical answer to “why is this release taking so long” could help resolve @StefanKarpinski’s annoyance without needing to resort to “stop paying attention”.
Personally, I’m not concerned about the delay between the release and the announcement, just curious what’s going on under the hood. Thanks, @ararslan for the explanation!
Thanks! So long as the JSON never lists updates that are not complete, tools (at least UpdateJulia) shouldn’t break, and I believe this is guaranteed by the presence of SHA hashes of the binaries in the JSON file, so no change is needed here.
If someone sees the release announcement and immediately decides to use a tool relying on the JSON file, they will be disappointed. In that sense, the release is not ready until the JSON file is updated. For this reason, I would prefer that you wait until it finishes.
Since the release announcements seem to be quite similar, perhaps they can be automated as well, using a dedicated “JuliaRelease” discourse bot account? Is that feasible? Would also take some of the manual burden off of your hands
I was less thinking of time saved, but assurance that when a user sees the announcement post, their tool that’s relying on the JSON file will work correctly as mentioned above (I may have misread the earlier messages about this though).
If folks decide that it is worth waiting till the JSON is generated before posting the announcement, then I imagine ararslan can do that quite reliably. I suspect that ararslan’s choice of when to post on discourse is more robust than any automated tool. Thanks, @ararslan!