JuliaRegistries setup for air gapped network

I will need to go into some technical details to explain how the package servers relate to the git repositories of the package.

  1. Every package is uniquely identified by it’s UUID, found in Project.toml, e.g. Example.jl/Project.toml at master · JuliaLang/Example.jl · GitHub
  2. This UUID is reflected in its entry in the General registry, e.g. in General/Registry.toml at master · JuliaRegistries/General · GitHub and again in General/Package.toml at master · JuliaRegistries/General · GitHub.
  3. Registered package versions are identified by their git tree hash (a hash of the content but not of the history, in contrast to the more commonly seen commit hash). These can be found in the registry: General/Versions.toml at master · JuliaRegistries/General · GitHub

Let’s now look at what happens when Pkg is asked to install the Example package at version 0.5.3.

  1. First it looks up the UUID in the General registry (or when applicable multiple registries, and if it finds more than one Example package, requires more information to choose between them). In this case it’s 7876af07-990d-54b4-ab0e-23690620f79a.
  2. The git tree hash for version 0.5.3 is found in General/Versions.toml at master · JuliaRegistries/General · GitHub as 46e44e869b4d90b96bd8ed1fdcf32244fddfb6cc.
  3. Now it knows what it wants and asks the package server to deliver UUID 7876af07-990d-54b4-ab0e-23690620f79a and tree hash 46e44e869b4d90b96bd8ed1fdcf32244fddfb6cc. Let’s get back to this later. For now assume that this fails, either because no package server was configured, the package server didn’t answer, or the package server didn’t have that package/version.
  4. Instead it falls back to getting the package from git. It looks up the repo URL in the registry, General/Package.toml at master · JuliaRegistries/General · GitHub, and finds https://github.com/JuliaLang/Example.jl.git. Now it asks GitHub, please give me tree hash 46e44e869b4d90b96bd8ed1fdcf32244fddfb6cc from https://github.com/JuliaLang/Example.jl.git. This is done using a special GitHub API call but if the URL was from another source it would instead use LibGit2 to clone the repository and then locally extract the desired tree hash.

So how does step 3 look? It’s just a GET of the URL https://pkg.julialang.org/package/7876af07-990d-54b4-ab0e-23690620f79a/46e44e869b4d90b96bd8ed1fdcf32244fddfb6cc, where pkg.julialang.org is the default package server.

Another relevant piece of information is that a Julia project/environment is defined by a pair of Project.toml/Manifest.toml files. The latter contains the UUIDs and tree hashes of all packages/versions in use.

This is how I would implement air gapping:

  1. Collect the Manifest.toml files for all environments you want to be able to use on the inside.
  2. Set up your own web server which accepts the package server URLs, forwards them to pkg.julialang.org, and saves a copy to disk.
  3. For each Manifest, instantiate them with Julia started with environment variables JULIA_PKG_SERVER pointing to your own web server and JULIA_DEPOT_PATH pointing to an empty temporary directory. This will force Julia to download all necessary resources and your web server caches them.
  4. Move your web server and its cache to the inside. Set up JULIA_PKG_SERVER to point to your webserver for all Julia instances running on the inside.

Why is this better than mirroring the git repositories of all used packages on the inside?

  1. No need redirect git URLs. The only thing you need to set is JULIA_PKG_SERVER.
  2. You get the registry for free via the /registries and /registry resource types.
  3. Certain packages are not self-contained in the git repository but also use “artifacts”, typically binary dependencies. These will also be captured by the procedure above through the /artifact resource type.

The fallback when artifacts can’t be obtained from a package server is to look up their source download URLs from the package’s Artifacts.toml file, e.g. Git_jll.jl/Artifacts.toml at main · JuliaBinaryWrappers/Git_jll.jl · GitHub, so you would need to mirror and redirect these as well as the git repositories if you go that route. Yes, there are mechanisms to override artifacts to point to system libraries, but that will require quite a lot of extra work.

There are some additional subtleties, such as “lazy artifacts”, or supporting artifacts on multiple platforms, which need some extra care, but it wouldn’t change the overall strategy.

18 Likes

Very useful, thanks !

2 Likes

If you already have a Julia environment set up, or at least know what packages your users need, you can build it into a custom sysimage with PackageCompiler.jl. Then it’s just a matter of uploading the Julia source code and the sysimage to the network and setting up the shell environment. I’ve written a tutorial for this:

4 Likes

In some situations it can be helpful to have a commercial agreement that can provide more contractual guarantees than an open source project can provide alone. It’s not something we’ve really advertised yet, but JuliaHub does support air-gapped installs into high security networks.

9 Likes

Oh… interesting !
A commercial solution might indeed be the way to go if the team makes extensive use of Julia (right now they’re more or less playing with Julia while considering adopting it for some stuff).
Thanks !

2 Likes

Thanks.
I had found your project’s repo but it doesn’t really fit with my objective of cloning it all, does it ?

I believe it does, at least from the end user’s perspective. They can open a Julia REPL, access the package manager in offline mode, and import whichever packages they need. It doesn’t create a clone of the entire Julia registry, but that might not be needed.

Since new packages have to be added in installments, I would say this approach works best for a small number of distinct teams, which is probably the norm on air-gapped systems. Each team would probably use similar packages and could have a designated sysimage. It would be a poor fit for, say, a university of students who each want to try out various packages in the Julia ecosystem, and that’s where I think an actual clone of the registry would shine. But maintaining that would still involve periodic updates.

Gunnar, thanks for this writeup!

I’m not a fan of your proposed approach because (and correct me if I’m wrong) it requires foresight about what specific packages will be needed, and different projects might require different environments.

We took the mirroring approach (using NativeJuliaMirrors.jl) but are having problems with the two issues you highlight – artifacts, and packages mirrored as github URLs (Plots.jl being the biggest offender). Manually mirroring the major github repositories (e.g. Plots.jl) and the major artifacts gets us most of the way there, but there must be a better way.

Without taking too much of your time, can you point us in the right direction for handling these problems systematically? It seems like it might be possible to force the mirroring of github packages for official Julia registry packages whenever these consist of URLs to github.

1 Like

Air gapping almost by definition requires a bit of foresight, doesn’t it?

Mirroring a package repo is effectively equivalent to caching all registered versions from the package server, and you can easily get the necessary tree hashes from the registry information. Less space efficient for sure, but also less hassle and in practice you won’t need very many of the historical versions.

Artifact information can be extracted from the Artifacts.toml file of each package (which uses artifacts), so it’s possible to predict which will be needed.

If you know what packages you might be interested in but not in which combinations, add all of those into one big environment and cache anything that is downloaded from the package server when it’s instantiated. Due to compat requirements some of the packages you get that way might not be the latest version. For good measure you can then do the same for single-package environments with each newer version of the outdated packages you got from the large environment.

1 Like