For what is worth, for some time I considered to create the JLL for the open source bindings of a proprietary vendor SDK. Because of reasons, this couldn’t work in Yggdrasil, but I have a working prototype of a workflow in an external repository. Now, because of reasons I’m not going to pursue this further, but I just wanted to show an arguably legitimate case where I believe the JLL couldn’t reasonably be produced in Yggdrasil, but I don’t understand why this should be banned altogether.
It would be important to be explicit about those reason, so that we can formulate workarounds in the guidelines around this policy. Could you elaborate on this?
Here are two reasons (that I know) which may preclude the usage of Yggdrasil:
- There may be legal restrictions demanding (or insinuating in a fuzzy way) that we not repackage/redistribute some product. This is Gurobi (the fuzzy flavor, I think).
- There definitely are technical limitations to the BB/Yggdrasil stack. Some compiler toolchains simply aren’t supported.
I don’t know about the motivations of Dilum and Keno, but I suspect they intersect with mine — I need better provenance and licensing information about artifacts. Centralizing everything under BinaryBuilder+Yggdrasil certainly solves the problem (or at least, points a centralized path towards solving the problem for everyone at once).
That said, for the purposes of provenance, I’m 100% ok with a “manual JLL” like Gurobi_jll. It’s downloading a tarball built and hosted directly by Gurobi Optimization LLC. The Gurobi EULA, however, is not distributed in the BB-standard fashion (under ./share/licenses/). This could potentially be addressed by other means.
On the other hand, I’m 100% not ok with what I did to PlotlyKaleido.jl. I added a GitHub action that simply grabs mathjax 2.7.9 and plops it into a tarball as a release asset so it can be used as an Artifact. There are three problems with this:
- There’s now a second build system we need to track and trust (in addition to Yggdrasil). Were any of the GHA extensions compromised when this was built?
- I did not do anything with its license
- And perhaps most importantly, it’s unclear where that file came from. While I did put the project name and version into the release name, it’s unstructured and ad-hoc. It’s a whole lot easier to connect a “canonical” URL like
https://cdn.jsdelivr.net/npm/mathjax@2.7.9/MathJax.jsto a known project & version. This is particularly important, because I would very much like to know that PlotlyKaleido contains a component that is vulnerable to CVE-2023-39663 (ok, that CVE is junk, but it’s a good example of the idea).
Is “some vendors really, really make the life hard and miserable unless you use the single system they bless (and even when you use that, good luck)” a valid answer?
I guess boiling this down, this all revolves around one concrete issue with Artifacts: they’re opaque, both in the ends and the means. We require packages to be git-based, for many very good reasons. But Artifacts live outside of that. Unlike package code, there’s no human readable diff between versions. Unlike code, they’re often not human readable at all. Unlike code, there’s often a creation process that’s just as important to know about as the end result. Unlike code, I can’t just fork a git repo and re-host or make a change myself and create a new version.
These are exactly the properties I care about for package registration, not unlike the git-hosted requirement for the code itself. Yggdrasil addresses these challenges; perhaps that’s how we should evaluate the workarounds?
I don’t get that, nothing stops packages from committing opaque blobs using git. And nothing stops packages from carefully documenting where the artifacts are coming from.
I suppose in that instance, we could end up making an exception. None of the rules for the General registry are ever 100% hard. But we’d at least want to try to push back in a situation of “third party lawyers have no idea what Yggdrasil is, don’t want to hear about it, and are making demands that make no sense, but aren’t budging”. It would be good to think about how one might best communicate with that third party and have some materials prepared to convince them that the standard JLL process is fine. But at the end of the day, if some lawyer absolutely won’t budge, we’ll have to see if an exception is necessary; or decide it’s not worth it, since they’re probably making your life miserable in other ways, too.
Hopefully, that situation will be extremely rare.
This isn’t checked automatically right now (I think), but opaque binary blobs in a package would definitely be something I’d flag in a package.
That’s a difficult area. There are packages which deal with binary file formats, which legitimately need binary files in their test data. There are also known cases where exactly that has been used to hide malignant payloads in compromised packages (don’t remember which ecosystem, wasn’t Julia).
Yep, that was the infamous XZ Utils issue, NVD - CVE-2024-3094. It used both an opaque binary blob in a test file and only activated it through some build systems (and required a malicious actor years to build trust and maintainership).
Yggdrasil actually built an XZ with the exploit code, but it wasn’t activated on that build system. PSA: backdoor in xz-utils and relevance for the Julia ecosystem
I didn’t mean “the repo can’t contain any binary files”. Of course a package that reads or write MP3s can have .mp3 files in its tests. I’m talking about packages with .dll files in their src folder (which people have tried to register).
I don’t think there’s anything that’s bullet-proof for preventing an XZ style attack. That’s just something were people have to be vigilant (and actually, AI’s might be getting pretty good at potentially flagging “there’s something fishy going on here”). That XZ attack did get caught, so that’s actually a success story in my book.
I’ve drafted some documentation to answer the questions in this Discourse thread. See my WIP PR here:
That PR is the result of internal discussion and review by the registry maintainers. However, some maintainers are still working on reviewing it, so I may make edits/updates based on their feedback.
We can continue to have the discussion in this Discourse thread (no need to move the discussion to the PR).