JuliaRegistries setup for air gapped network

I will need to go into some technical details to explain how the package servers relate to the git repositories of the package.

  1. Every package is uniquely identified by it’s UUID, found in Project.toml, e.g. Example.jl/Project.toml at master · JuliaLang/Example.jl · GitHub
  2. This UUID is reflected in its entry in the General registry, e.g. in General/Registry.toml at master · JuliaRegistries/General · GitHub and again in General/Package.toml at master · JuliaRegistries/General · GitHub.
  3. Registered package versions are identified by their git tree hash (a hash of the content but not of the history, in contrast to the more commonly seen commit hash). These can be found in the registry: General/Versions.toml at master · JuliaRegistries/General · GitHub

Let’s now look at what happens when Pkg is asked to install the Example package at version 0.5.3.

  1. First it looks up the UUID in the General registry (or when applicable multiple registries, and if it finds more than one Example package, requires more information to choose between them). In this case it’s 7876af07-990d-54b4-ab0e-23690620f79a.
  2. The git tree hash for version 0.5.3 is found in General/Versions.toml at master · JuliaRegistries/General · GitHub as 46e44e869b4d90b96bd8ed1fdcf32244fddfb6cc.
  3. Now it knows what it wants and asks the package server to deliver UUID 7876af07-990d-54b4-ab0e-23690620f79a and tree hash 46e44e869b4d90b96bd8ed1fdcf32244fddfb6cc. Let’s get back to this later. For now assume that this fails, either because no package server was configured, the package server didn’t answer, or the package server didn’t have that package/version.
  4. Instead it falls back to getting the package from git. It looks up the repo URL in the registry, General/Package.toml at master · JuliaRegistries/General · GitHub, and finds https://github.com/JuliaLang/Example.jl.git. Now it asks GitHub, please give me tree hash 46e44e869b4d90b96bd8ed1fdcf32244fddfb6cc from https://github.com/JuliaLang/Example.jl.git. This is done using a special GitHub API call but if the URL was from another source it would instead use LibGit2 to clone the repository and then locally extract the desired tree hash.

So how does step 3 look? It’s just a GET of the URL https://pkg.julialang.org/package/7876af07-990d-54b4-ab0e-23690620f79a/46e44e869b4d90b96bd8ed1fdcf32244fddfb6cc, where pkg.julialang.org is the default package server.

Another relevant piece of information is that a Julia project/environment is defined by a pair of Project.toml/Manifest.toml files. The latter contains the UUIDs and tree hashes of all packages/versions in use.

This is how I would implement air gapping:

  1. Collect the Manifest.toml files for all environments you want to be able to use on the inside.
  2. Set up your own web server which accepts the package server URLs, forwards them to pkg.julialang.org, and saves a copy to disk.
  3. For each Manifest, instantiate them with Julia started with environment variables JULIA_PKG_SERVER pointing to your own web server and JULIA_DEPOT_PATH pointing to an empty temporary directory. This will force Julia to download all necessary resources and your web server caches them.
  4. Move your web server and its cache to the inside. Set up JULIA_PKG_SERVER to point to your webserver for all Julia instances running on the inside.

Why is this better than mirroring the git repositories of all used packages on the inside?

  1. No need redirect git URLs. The only thing you need to set is JULIA_PKG_SERVER.
  2. You get the registry for free via the /registries and /registry resource types.
  3. Certain packages are not self-contained in the git repository but also use “artifacts”, typically binary dependencies. These will also be captured by the procedure above through the /artifact resource type.

The fallback when artifacts can’t be obtained from a package server is to look up their source download URLs from the package’s Artifacts.toml file, e.g. Git_jll.jl/Artifacts.toml at main · JuliaBinaryWrappers/Git_jll.jl · GitHub, so you would need to mirror and redirect these as well as the git repositories if you go that route. Yes, there are mechanisms to override artifacts to point to system libraries, but that will require quite a lot of extra work.

There are some additional subtleties, such as “lazy artifacts”, or supporting artifacts on multiple platforms, which need some extra care, but it wouldn’t change the overall strategy.

18 Likes