JuliaRegistries setup for air gapped network

Users need to be able to use Julia software (with or without the source code of the app), without internet access. That’s possible, people may think it’s not, but it’s totally possible, i.e. to compile (with very few exceptions, I only know of one problematic package).

For developers however, I realize you likely didn’t make the rules, you’re just following them, but the rules are just awful for them. Yes, we did develop before the web/internet, and I read books/manuals on paper (still do sometimes).

What you can do is install the packages you need (not sure you really need to pin them), and then you just develop with that set. You should be able to send the .julia folder to developers (assuming the CPU arch/and OS is the same). I don’t think you need Docker or similar. But you have a bit of a problem if you then need to add another package or update one.

I’m just saying I wouldn’t want to work at such a company, with those rules (that do not apply to you…), most wouldn’t so I’m not sure it’s a high priority to make this easier. Though it seems you already found the software to do this (that I didn’t know of, hope it works well for you, it seems well documented).

The file you need (and its full repo, or at least a subset of it, and the linked repos):

How to answer that.
If you don’t get it, I probably didn’t make myself clear.
But i’m still somehow pissed by the way you judge.

About rules that may be applicable to some but not others… you just got that wrong too.
Well, somehow you’re right but not in the way you think : sysdmins do take responsibility for anything imported on that network, not the final user that’s asking for it. Thanks.

That network is just one network we manage, not all of them are this closed.
And many people working on those other networks would really like to work in that closed network, 'cause if it’s stupidly protected there may be a reason like cool stuff. duh.


Regarding Registry.toml
That file is a list of packages and hashes, but it does not mention any other git repository that might be needed. But experience shows there is a need for other git repositories, as least JuliaO/JSON.jl. As stated earlier, when i explained i modified the url/repo parameter in that file.

I’m sorry, I try to be helpful, especially to new users, and I failed here. I mean I did try in my first answer, and I think it’s ok, but yes, I did judge in the next one, while I tried to make clear I wasn’t trying to judge you. I didn’t or couldn’t put myself in your shoes, rather your developers, and though I could maybe rephrase something in that answer, I’ll let it stand at least for now.

We are all here, at least me, just helping in our free time, and that’s at least what I meant to do.

Then that’s it :wink:
As you may have noticed, I myself have the finesse of a bull. And short memory for things like that.
Thanks for trying !

1 Like

@palli These are companies which work in the defence and security sectors. Government agencies also. They have requirements for systems to be on air gapped networks.
Staff of course have access to email and web - but on different less secure networks.
If anyone is interested I build systems for companies like this.

You may choose not to work for a company or government agency which uses air gapped secure systems - your choice. However there are engineers using these systems to develop weapons for our defence. In the current climate we should be grateful.

2 Likes

@lateo Maybe we can talk offline. I m interested in your recipes for conda installations. And Julia! Maybe we could do a writeup of good practice in this area.

1 Like

After saying what I did about secure networks… someone did comment to me recently that managing completely isolated HPC setups is becoming difficult. I agree - as I said above a lot of utilities and development flows assume a functioning Internet link. Look at the hooh-ha when web developers could not access a simple maths function which was removed from a repository (*)

Do we have good ideas for answering this problem?

A naive solution would be a shoebox sized portable server which contains the Internet.
Or at least the repositories which you need.
And yes the first person who says how can you check everything you hook up in the shoebox is secure and not tampered with is correct.
The shoebox would have to use the same key mechanisms as the original repositories.

(*) My opinion - why in the heck download a simple maths function every time?
I believe this was a twos complement function which could have been a single line of code.

1 Like

Are you thinking of the left-pad debacle?

(side talk about anaconda)

Cannot share recipes, but it’s pretty straightforward :

  • clone your favourite channels with conda-mirror (beware, it’s ok on first run then extra-sloooow, but you won’t ever get blacklisted if you use 1 instance of that tool)
  • after import procedures, push to web server (e.g. nginx),
  • rebuild the conda indexes (to fix the indexes for stuff that’s been destroyed during import procedures)
  • client side : deploy a “.condarc” pointing to your web server (e.g. from GPO if for windows clients, from /etc/skel if used from a linux server). You’ll have to figure out a way to push “.condarc” updates to users on linux systems.

That’s it !

It can absolutely be used but doesn’t really have much advantage over PkgServer for this purpose.

But yes, using the package server machinery is really the way to go for air gapping. There are some pieces of tooling missing to make this convenient, which could be solved by a package. It “just” requires someone with the knowledge and the need to implement it.

2 Likes

Could you explain the PkgServer’s role regarding the git repos it relies on ? I don’t really get it. Some kind of glue ? What’s the plus ?

Considering we’re not planing on hosting our own packages but are only interested in cloning existing Pkg repos.

And we already have a solution to distribute software artefacts to cover for users needs in terms of sharing various libs, execs & so on which they could use to share some home-made julia stuff.
My concern here is only about cloning existing repos, for offline use.

I will need to go into some technical details to explain how the package servers relate to the git repositories of the package.

  1. Every package is uniquely identified by it’s UUID, found in Project.toml, e.g. Example.jl/Project.toml at master · JuliaLang/Example.jl · GitHub
  2. This UUID is reflected in its entry in the General registry, e.g. in General/Registry.toml at master · JuliaRegistries/General · GitHub and again in General/Package.toml at master · JuliaRegistries/General · GitHub.
  3. Registered package versions are identified by their git tree hash (a hash of the content but not of the history, in contrast to the more commonly seen commit hash). These can be found in the registry: General/Versions.toml at master · JuliaRegistries/General · GitHub

Let’s now look at what happens when Pkg is asked to install the Example package at version 0.5.3.

  1. First it looks up the UUID in the General registry (or when applicable multiple registries, and if it finds more than one Example package, requires more information to choose between them). In this case it’s 7876af07-990d-54b4-ab0e-23690620f79a.
  2. The git tree hash for version 0.5.3 is found in General/Versions.toml at master · JuliaRegistries/General · GitHub as 46e44e869b4d90b96bd8ed1fdcf32244fddfb6cc.
  3. Now it knows what it wants and asks the package server to deliver UUID 7876af07-990d-54b4-ab0e-23690620f79a and tree hash 46e44e869b4d90b96bd8ed1fdcf32244fddfb6cc. Let’s get back to this later. For now assume that this fails, either because no package server was configured, the package server didn’t answer, or the package server didn’t have that package/version.
  4. Instead it falls back to getting the package from git. It looks up the repo URL in the registry, General/Package.toml at master · JuliaRegistries/General · GitHub, and finds https://github.com/JuliaLang/Example.jl.git. Now it asks GitHub, please give me tree hash 46e44e869b4d90b96bd8ed1fdcf32244fddfb6cc from https://github.com/JuliaLang/Example.jl.git. This is done using a special GitHub API call but if the URL was from another source it would instead use LibGit2 to clone the repository and then locally extract the desired tree hash.

So how does step 3 look? It’s just a GET of the URL https://pkg.julialang.org/package/7876af07-990d-54b4-ab0e-23690620f79a/46e44e869b4d90b96bd8ed1fdcf32244fddfb6cc, where pkg.julialang.org is the default package server.

Another relevant piece of information is that a Julia project/environment is defined by a pair of Project.toml/Manifest.toml files. The latter contains the UUIDs and tree hashes of all packages/versions in use.

This is how I would implement air gapping:

  1. Collect the Manifest.toml files for all environments you want to be able to use on the inside.
  2. Set up your own web server which accepts the package server URLs, forwards them to pkg.julialang.org, and saves a copy to disk.
  3. For each Manifest, instantiate them with Julia started with environment variables JULIA_PKG_SERVER pointing to your own web server and JULIA_DEPOT_PATH pointing to an empty temporary directory. This will force Julia to download all necessary resources and your web server caches them.
  4. Move your web server and its cache to the inside. Set up JULIA_PKG_SERVER to point to your webserver for all Julia instances running on the inside.

Why is this better than mirroring the git repositories of all used packages on the inside?

  1. No need redirect git URLs. The only thing you need to set is JULIA_PKG_SERVER.
  2. You get the registry for free via the /registries and /registry resource types.
  3. Certain packages are not self-contained in the git repository but also use “artifacts”, typically binary dependencies. These will also be captured by the procedure above through the /artifact resource type.

The fallback when artifacts can’t be obtained from a package server is to look up their source download URLs from the package’s Artifacts.toml file, e.g. Git_jll.jl/Artifacts.toml at main · JuliaBinaryWrappers/Git_jll.jl · GitHub, so you would need to mirror and redirect these as well as the git repositories if you go that route. Yes, there are mechanisms to override artifacts to point to system libraries, but that will require quite a lot of extra work.

There are some additional subtleties, such as “lazy artifacts”, or supporting artifacts on multiple platforms, which need some extra care, but it wouldn’t change the overall strategy.

18 Likes

Very useful, thanks !

2 Likes

If you already have a Julia environment set up, or at least know what packages your users need, you can build it into a custom sysimage with PackageCompiler.jl. Then it’s just a matter of uploading the Julia source code and the sysimage to the network and setting up the shell environment. I’ve written a tutorial for this:

4 Likes

In some situations it can be helpful to have a commercial agreement that can provide more contractual guarantees than an open source project can provide alone. It’s not something we’ve really advertised yet, but JuliaHub does support air-gapped installs into high security networks.

9 Likes

Oh… interesting !
A commercial solution might indeed be the way to go if the team makes extensive use of Julia (right now they’re more or less playing with Julia while considering adopting it for some stuff).
Thanks !

2 Likes

Thanks.
I had found your project’s repo but it doesn’t really fit with my objective of cloning it all, does it ?

I believe it does, at least from the end user’s perspective. They can open a Julia REPL, access the package manager in offline mode, and import whichever packages they need. It doesn’t create a clone of the entire Julia registry, but that might not be needed.

Since new packages have to be added in installments, I would say this approach works best for a small number of distinct teams, which is probably the norm on air-gapped systems. Each team would probably use similar packages and could have a designated sysimage. It would be a poor fit for, say, a university of students who each want to try out various packages in the Julia ecosystem, and that’s where I think an actual clone of the registry would shine. But maintaining that would still involve periodic updates.

Gunnar, thanks for this writeup!

I’m not a fan of your proposed approach because (and correct me if I’m wrong) it requires foresight about what specific packages will be needed, and different projects might require different environments.

We took the mirroring approach (using NativeJuliaMirrors.jl) but are having problems with the two issues you highlight – artifacts, and packages mirrored as github URLs (Plots.jl being the biggest offender). Manually mirroring the major github repositories (e.g. Plots.jl) and the major artifacts gets us most of the way there, but there must be a better way.

Without taking too much of your time, can you point us in the right direction for handling these problems systematically? It seems like it might be possible to force the mirroring of github packages for official Julia registry packages whenever these consist of URLs to github.

1 Like

Air gapping almost by definition requires a bit of foresight, doesn’t it?

Mirroring a package repo is effectively equivalent to caching all registered versions from the package server, and you can easily get the necessary tree hashes from the registry information. Less space efficient for sure, but also less hassle and in practice you won’t need very many of the historical versions.

Artifact information can be extracted from the Artifacts.toml file of each package (which uses artifacts), so it’s possible to predict which will be needed.

If you know what packages you might be interested in but not in which combinations, add all of those into one big environment and cache anything that is downloaded from the package server when it’s instantiated. Due to compat requirements some of the packages you get that way might not be the latest version. For good measure you can then do the same for single-package environments with each newer version of the outdated packages you got from the large environment.

1 Like