In the past, we’ve allowed some manual JLLs to be registered in the General registry. However, after some internal discussion, the registry maintainers have decided that we won’t accept manual JLLs in the future. Going forward, all JLLs should come from Yggdrasil, so that we can centralize JLL reviews in a single place.
Existing manual JLLs will not be affected. This only applies going forward.
Do we have a definition of what a JLL is? Is this policy just about package names? Or is it more generally about packages that only exist to distribute binaries through their artifacts?
@dilumaluthge will chime in with the official answer, but I like this because Yggdrasil distributes binaries, and we want to know how the binaries were built in the first place - which means having an Yggdrasil build recipe.
In theory, you can build JLLs using BinaryBuilder and register them directly - and that is being discouraged going forward.
Yes, sorry, I should have been clearer in expressing my huge and fanatical support of the general idea! I’m just surprisingly fuzzy on what a JLL actually is, beyond a tautological answer.
It’s definitely a package that solely exists to distribute some artifact. Is that all it is? Or are JLLs definitionally limited to packages that are generated by BinaryBuilder? Of course, that’d mean that a “manually generated JLL” isn’t a JLL at all, so that’s not very helpful.
I do get the general concept… but what about something like RDatasets.jl? Someday it might make sense to refactor that to use artifacts. At what point does it become a “manual JLL”?
Note that currently, JLLs encompass more than binaries/libraries/executables built by an Yggdrasil worker. There’s downloaded blobs (e.g., CUDA_jll.jl) and javascripts (e.g., D3_jll.jl) and even data (e.g., TermInfoDB_jll.jl).
Having said that: there’s an easy way to tell which packages have a public build script in Yggdrasil: they belong to the JuliaBinaryWrappers org. So if that’s the only reason…?
That’s a good question that I hope someone can answer authoritatively!
In practice, I think what this comes down to: A “manual JLL” is a package with the suffix _jll in the name, registered via the normal Registrator workflow instead of through Yggdrasil. Such a registration will be subject to the automerge rules, which will flag the name for containing an underscore, and would need a manual merge. The policy change is that such a registration PR will no longer be merged, going forward.
A potentially separate issue that goes more to the heart of “What is a JLL?” is a normally named package that distributes binary artifacts. I don’t think we have automated checks against this at the moment, but the manual review of the registration PR might result in a blocking comment “Shouldn’t this be a JLL package?”
And then there’s the gray area, where a lack of clarity on “What exactly is a JLL” will lead to some discussion.
I think a clearer way to word this policy for the moment is that any jll that could technologically be in Yggdrasil must be in Yggdrasil so that we have some traceability into the binary artifacts it distributes.
To me this is a much different statement than previously described. How is that enforceable? Any package can have an Artifacts.toml. Who is “we” and why is this important? (It makes security easier isn’t an answer because your security policy could just be only packages built in Yggdrasil.)
I thought this was just about the special casing of the _jll namespace.
To which the rule is simple: no package may have a name ending in _jll unless it is built and distributed via Yggdrasil.
I’m curious about the implications for binaries that cannot (mainly due to licensing) be compiled by Yggdrasil’s pipeline. As a hypothetical example for discussion purposes, a C shared library generated with Matlab’s compiler SDK. Would tools authors have to provide an alternative mechanism for retrieving this cross-platform binaries on the target?
The most important point here (to me) is to ensure we know the provenance of artifacts to the extent that is possible. There are two pieces to that:
The original source(s) and a way to trust it (URLs and their content hashes)
The compilation/aggregation process and a way to trust it (known trusted infrastructure, signatures, etc)
Packages like Gurobi_jll.jl whose Artifacts.toml points to an authoritative third party source (in this case, https://packages.gurobi.com) directly satisfy this. But a general BinaryBuilder-built package launders this information into a separate GitHub release download.
I think that’s the crucial point. If you’re registering a package that provides self-hosted Artifacts for which external provenance information could be available, it must be done through Yggdrasil’s infrastructure.
One bit of Artifacting that I’ve personally done that would be in conflict with this is in PlotlyKaleido.jl — it builds its own bundled mathjax.js and plotly.js resources in an ad-hoc way. It doesn’t use BinaryBuilder at all, but it’s not dissimilar in end-effect. I think it would be improved if those were actually Yggdrasil-built, even though they are not compiled binaries.
Well, yeah, it’s all just me brainstorming and still quite wishy-washy. There’s no definitive answer for what makes a “self-hosted” asset, or what makes for “provenance information” at all. The RFC2119 definitions for must/should are the least of my worries here
That post was me thinking out loud as I wrote it; my main goal was to attempt a generalization that applies to all packages (regardless of name) and focuses on the properties of their artifacts.
I generally like the idea, but consider the rabbit holes. For example, a proprietary compiler Yggdrasil can’t run (well, unless you’re going to license the product), or even in the case you do, it may have pre-compiled runtimes for which you’ll never get the source. And how deep do you go? These could, themselves be built on other libraries. Matlab, Gurobi and I’m sure others may fall in these categories.
The other question you raise with this in my mind is, who is the arbiter of “authoritative?” While, as a community, we’re familiar with these examples, what if someone shows up with commercial tool X we’ve never heard of before? I think the issues would come down to “a way to trust it” (what counts as a “way” and what do we mean by “trust”?), and what to do in the case where external provenance information pragmatically isn’t available, regardless of whether in theory it could be (e.g Matlab could publish all of the internals of its build process and sources, but I’m guessing it won’t).
You might just draw a hard line on this, and then that’s that. We just need to be clear for these edge cases what will or will not be accepted, and the implications, for example driving a package to leverage alternative binary delivery options: is this ultimately more desirable?
Yggdrasil does not always compile a binary, there are some cases where we just repackage them. For instance, the entire CUDA stack is packaged via Yggdrasil, but is just a repackaging of the NVIDIA-compiled libraries/programs. Same thing for Intel MKL - we just repackage the compiled libraries that they provide through other channels into our system.
So while it is preferred to compile from source (because then we can control the build flags), we can still use Yggdrasil to package software that we have a distribution license for into a JLL package. The key part here being the distribution license - if we do not have a license that allows us to distribute it, then we should not be putting it into Yggdrasil.
We don’t need to go far down rabbit holes here — that’s kinda the point. It’s enough to ensure that something we trust (like Yggdrasil) is reliably pointing us towards the next rabbit hole, if there is one.
If it’s not possible to do that kind of accreditation, then that’s valuable information too.
That’s reasonable. Presuming that the pipeline for building, submitting and delivering JLLs via Yggdrasil enables this, along with some of the other range of JLL edge cases you’ve highlighted, then it’s a good move. The clarity on what is “manual” and what is meant by “built by Yggdrasil” in this thread has been helpful.
Just for some numbers and concrete examples, I ran a script over the entire General registry to gather all the packages with artifacts. Of the 13k packages, 1880 have an Artifacts.toml with download URLs in it. Of those, 1677 have names ending with _jll (the JLLs).
There are only 4 packages named *_jll whose artifacts aren’t pointing to a GitHub release asset within their own repository: Gurobi_jll, ReSHOP_jll, UDPipe_jll, Xpress_jll. There are another 7 *_jlls that aren’t in the JuliaBinaryWrappers org: CCTools_jll, BQPD_jll, libtapi_jll, Hashpipe_jll, Reactant_jll, KNITRO_jll, macOSSDK_jll. These are definitively all “manual” JLLs (and might be all of them?).
More varied are the ~200 packages with artifacts not named *_jll. 51 of these packages have an artifact that points to one of their own release assets. Many of these are providing data. A good number, however, look to be doing BinaryBuilder like things. These would be the kinds of packages that could help fill in the varying levels of gray between “JLL” and “not”:
If you want to try and draw a line for “What is a JLL package”, then I would say anything using JLLWrappers is a JLL, and if it doesn’t, then it isn’t a JLL. JLLWrappers sets up the package in a very specific way, handles all the artifact selection logic for the platform, and exposes a known set of functions/variables to query the package. So, saying no more manual registration of JLL packages would in my mind equate to no more manual registration of packages that use JLLWrappers - they must go through Yggdrasil.
That said, there are a few in the above list that are done that way for licensing reasons, e.g.,
Gurobi_jll ← Proprietary solver that had to be distributed this way
Xpress_jll ← Proprietary solver that had to be distributed this way
KNITRO_jll ← Proprietary solver that had to be distributed this way
BQPD_jll ← We have permission from the author to distribute the binary under a BSD license, but the source code isn’t open
These I think are due to work on BinaryBuilder2 (so would probably move to a more central location once that is progressed further):
CCTools_jll
libtapi_jll
macOSSDK_jll
Reactant_jll is a bit of an odd one out here, because it did initially start as a Yggdrasil-built JLL, however because it basically involves building LLVM every single time for every architecture, it was really killing the shared CI infrastructure and slowing down all the other builds done in Yggdrasil, so they managed to get some build resources of their own and are actually running their own Yggdrasil pipeline system on it to build Reactant_jll.
And finally, others are just really old and basically from when this entire system is in its infancy:
ReSHOP_jll ← Last commit 2022
Hashpipe_jll ← Last commit 2021
UDPipe_jll ← Last commit 2020
So actually, “manual JLL packages” aren’t that prevalent in the registry, and the ones that had been merged up until now basically had a reason for being manual.
As for what to do about “JLL-like” or “BB-like” packages, that can be decided separately from what a “JLL package” is IMO. You could decide to make policies about where things must get built, or by whom, but there are always going to be edge cases and things that even Yggdrasil can’t support (e.g., those Java packages really won’t work well in Yggdrasil because we don’t have Java).
If Gurobi_jll was being registered today, what should the maintainer’s response be? Ask the authors to name it GorubiBinary and change nothing else? Or repackage the binaries in Yggdrasil somehow? or something else?