Reproducible builds and "Triangle of Secure Code Delivery"

How does Julia stack up (to other languages)?

  1. Reproducible Builds:
    Given the application’s source code, it should be possible to reproduce the distributed package exactly, down to contents that are known to vary benignly, such as build timestamps.
    This property is important for auditing. A developer can sign both the source code and distributed binary package, but how does the user (or, more likely, a security auditor) know the source code actually represents the binary? […]
  2. Userbase Consistency Verification:
    […] These packages should be available permanently in a public record.
    This is the most important of the three properties. Simply put: Everyone gets the same thing. If you can guarantee that everyone gets an identical copy of the software, then it becomes impossible to hide a targeted attack. If an attacker wants to backdoor one user’s software, they have to backdoor every user’s software. This greatly increases the attacker’s risk of being detected.
  3. Cryptographic Signatures:
    The software package, source code, and patches (changes) should be cryptographically signed by the upstream software source (i.e. the developers).
    […]

I conjecture that these three properties, if implemented correctly, are sufficient to disincentivize both large-scale attacks (i.e. the NSA wants to put a vulnerability in everyone’s copy of Tor) and localized targeted attacks (i.e. the NSA wants to compromise a single user’s software download to take control of their system).

Having just two of these properties is not enough:

I believe Julia packages are excellent for reproducible builds (builds may be redundant here, as usually dealing with source code, but this may also apply to JLLs), with the Artifacts system. I still don’t know about “[build] timestamps”, or if applies.

For the second, most “most important”, property, that’s provided by Github (or e.g. Gitlab) but I don’t know about the 3rd point cryptographic signatures. Would it be up to e.g. Github, do they automatically make for you? It seems if Github were compromised then it wouldn’t be enough to have them in the same place, so is it up to the Julia registry? For Julia itself, they exist, and you kind of trust the Julia developers…

Not covered by this, is you actual main source file(s). That seems up to you, often such code is just internal, but sometimes distributed, not always in a package… so Julia systems (i.e. the registry) can’t protect you then, so might be an argument to register more code.

This could also get complicated with e.g. Python or R dependencies, PythonCall or PyCall and RCall, but I’m mostly thinking about Julia-only code, and JLLs. Feel free to comment a bit on other languages (even PHP), independently of Julia, or at least if you see issues specifically when used with Julia.

I found the above link at:

  1. We know it’s a solvable problem.

We had briefly introduced our complete solution when we announced that WordPress would cryptographically sign its automatic updates in 2019.

https://reproducible-builds.org/reports/2022-01/

I don’t think that cryptographic signatures are entirely necessary and indeed we don’t use them. We do use tree hashes to identify code, which makes the code inherently verifiable. And tree verification is automatically done on Julia clients whenever packages and artifacts are installed, so you’d get an error if someone tried to served you the wrong content. The question is how do you know your tree hashes are the right ones? This post’s answer is for the tree hashes to be signed by maintainers of the software. But they’re missing a big implicit requirement from their triangle: in order to verify a signature you need to know the author’s public key—how do you know the public keys you have are the right ones? If you’re going to verify signatures, you need some why to securely distribute the right keys to end-users, but if you have that, why not just securely distribute the correct tree hashes directly and and skip the signature step? Indeed, that’s exactly what we do: we use the most widely distributed PKI in the world—the one that browsers use to verify HTTPS websites—to securely distribute source tree hashes directly.

2 Likes

BinaryBuilder already takes some steps to make builds reproducible:

We could still do more, but with all of the above builds are in most cases reproducible. At this point, the only known variation across rebuilds are the log files. We’d like to reorganise how tarballs are constructed and perhaps split them across different tarballs, but that’s a somewhat large undertake with likely many breaking changes for the BinaryBuilder ecosystem (I’m talking about the recipes in Yggdrasil, end users wouldn’t observe any difference apart from less clutter in .julia/artifacts). In Yggdrasil we have a small script to test reproducibility of a recipe, which has to skip the logs directory for this reason.

Also, remember that all the binary dependencies of the official release of Julia come from BinaryBuilder, this means the foundations of Julia (but not the julia binary itself) come from reproducible builds which can be verified with BinaryBuilder.

2 Likes