JuliaRegistries setup for air gapped network

Hi everyone.

I’m no Julia enthousiast, and i don’t know much about Julia.
I’m a sysadmin looking for ways to offer a clone of JuliaRegistries to users using a completely “offline network”, aka air gapped network. It means those users have zero internet access, they’re in a completely disconnected environment.

Everybody needs good tools, so we’re already duplicating a lot of stuff (various linux distros, anaconda channels, git repos…) online before importing it all to this offline network, so the idea to add JuliaRegistries/General to the mix came quite fast.

But reading the doc didn’t really made the way to use alternative repositories (when the official one can’t possibly be joined) that clear to me, like my use case simply never was a use case for anyone. Offline isn’t offline, that kind of funny stuff.

After looking around i though it was quite clear the julia pkg system wasn’t supposed to work the way we at work hope to be able to use it, but there were also no apparent major reason why it shouldn’t be possible.

Exploring, I found various julia tools from the community that may or may not help in doing part(s) of the job i need done, be were not meant for my use case thus needing unknown amount of time and experiments.

So I thought i would rather go for the low level way, that since Pkg management is git based, i should be able to host a copy anyway and somehow make it work ! Or maybe not ?

Basically, this is what I tried :

  • cloned the git repo, imported it to an air gapped network, pushed it to a new origin (dedicated air gapped repo)
  • modify url in registry.toml to match the git server’s url, in case it might matter
  • client side (a jupyterhub session), played with env variables until i figure out the ones needed (i had little success in setting them in statup.jl, but i managed to set them with ENV syntax ; good enough for a try)
  • enter Pkg mode (“]”), add repo url : sync OK :star_struck:

There I was thinking that was it. But it wasn’t :frowning:
When I tried to install a package (IJulia if it matters), it complains it cannot connect to another repo (looking for JSON.jl from JuliaIO github repo).
So maybe I should have cloned that one too… or maybe after i clone it a third one will be needed, then a fourth and so on ?

Has anyone achieved some success in not using the official repos ?
Are some of those URI hard coded or did i miss some important configuration file somewhere ?
Did i miss some awesome air gapped julia info ? :upside_down_face:

Any help appreciated.

2 Likes

Maybe GitHub - GunnarFarneback/LocalPackageServer.jl: Julia storage and package server for local packages. is worth looking at.

Some random musings from me…
You could futz with your internal DNS and point github to an internal server

JuliaPro was set up to answer use cases like this - but is discontinued AFAIK

Anyone ever used Artifactory with Julia Artifactory - Universal Artifact Repository Manager - JFrog

BTW I build HPC clusters on air gapped networks

1 Like

Another musing - with modern Linux distros it is becoming more and more difficult to have a system which is completly air gapped. Modern practices simply assume that you can reach out to the Internet for packages.

1 Like

LocalPackageServer might help to setup a PkgServer, but as far as i understand you can do without PkgServer and rely directly on git repositories. So might help to make it shiny, after it works.

But I don’t even know which git repos should I clone, nor do i know how many of them are needed to be able to use JuliaRegistries/General in an air gapped network.

The url parameter from registry.toml seems to be enough since julia could contact the air gapped git repo and synchronise, no need for hacky DNS setup :sweat_smile:
My problem is that it needed some other unexpected git repo, and i’m not planing on trying until i find all the needed repos one by one in the dark : i might get old long before i’m done with the needed git repos list.

Some other projects related to the issue :

1 Like

But you can clone most stuff no sweat and replicate them in the air gapped environment.
Our users use their stuff “as if” they were connected, e.g. for anaconda they just conda install -c coolchannel coolstuff and don’t have to care about how it works : it just works for them.

The only thing really buggin’ us is microsoft apps stuff.
Well, Julia is too now :sweat_smile:
(edit: well, regarding linux we certainly don’t go for flatpak or snap in air gapped env ; package manager ftw ! much more easy to clone and setup than flatpaks)

I don’t have a solution except:

  1. Isn’t easier to just allow access to JuliaRegistries, as an exception? It’s on github, so then to all of it, or maybe it’s possible to a/that subset of it? For developers you really want to have access, and not just to an (outdated) mirror. I wouldn’t want to work as a developer without access to that (and more, e.g. Discourse here, and some would insist on e.g. Stackoverflow).
  2. If this is for a specific app, then you make it work, and then distribute the (source) code you have, or you compile it first with PackageCompiler.jl.

Well I guess I have to give some context to help you guys to get it.

  1. No exception. Complete isolation. Strict rules : you get them or you get out.
    We sysadmins get usefull stuff in that network through funky import procedures. Some TB each month.

  2. The goal here is to provide some cosy environment for those devs.
    It ain’t no fun to dev without internet, especially nowadays, so we add some sugar for them guys.
    But we sysadmins can’t know what they’ll need beforehand (I may be root but that’s still a bit much to ask of me… :grin:)
    Import procedures take some time too, and some imported stuff may (will) not survive the trip.
    So when applicable we go for the big toolbox then point this toolbox to our users, and that’s it for that technology unless strongly justified.

1 Like

Users need to be able to use Julia software (with or without the source code of the app), without internet access. That’s possible, people may think it’s not, but it’s totally possible, i.e. to compile (with very few exceptions, I only know of one problematic package).

For developers however, I realize you likely didn’t make the rules, you’re just following them, but the rules are just awful for them. Yes, we did develop before the web/internet, and I read books/manuals on paper (still do sometimes).

What you can do is install the packages you need (not sure you really need to pin them), and then you just develop with that set. You should be able to send the .julia folder to developers (assuming the CPU arch/and OS is the same). I don’t think you need Docker or similar. But you have a bit of a problem if you then need to add another package or update one.

I’m just saying I wouldn’t want to work at such a company, with those rules (that do not apply to you…), most wouldn’t so I’m not sure it’s a high priority to make this easier. Though it seems you already found the software to do this (that I didn’t know of, hope it works well for you, it seems well documented).

The file you need (and its full repo, or at least a subset of it, and the linked repos):

How to answer that.
If you don’t get it, I probably didn’t make myself clear.
But i’m still somehow pissed by the way you judge.

About rules that may be applicable to some but not others… you just got that wrong too.
Well, somehow you’re right but not in the way you think : sysdmins do take responsibility for anything imported on that network, not the final user that’s asking for it. Thanks.

That network is just one network we manage, not all of them are this closed.
And many people working on those other networks would really like to work in that closed network, 'cause if it’s stupidly protected there may be a reason like cool stuff. duh.


Regarding Registry.toml
That file is a list of packages and hashes, but it does not mention any other git repository that might be needed. But experience shows there is a need for other git repositories, as least JuliaO/JSON.jl. As stated earlier, when i explained i modified the url/repo parameter in that file.

I’m sorry, I try to be helpful, especially to new users, and I failed here. I mean I did try in my first answer, and I think it’s ok, but yes, I did judge in the next one, while I tried to make clear I wasn’t trying to judge you. I didn’t or couldn’t put myself in your shoes, rather your developers, and though I could maybe rephrase something in that answer, I’ll let it stand at least for now.

We are all here, at least me, just helping in our free time, and that’s at least what I meant to do.

Then that’s it :wink:
As you may have noticed, I myself have the finesse of a bull. And short memory for things like that.
Thanks for trying !

1 Like

@palli These are companies which work in the defence and security sectors. Government agencies also. They have requirements for systems to be on air gapped networks.
Staff of course have access to email and web - but on different less secure networks.
If anyone is interested I build systems for companies like this.

You may choose not to work for a company or government agency which uses air gapped secure systems - your choice. However there are engineers using these systems to develop weapons for our defence. In the current climate we should be grateful.

2 Likes

@lateo Maybe we can talk offline. I m interested in your recipes for conda installations. And Julia! Maybe we could do a writeup of good practice in this area.

1 Like

After saying what I did about secure networks… someone did comment to me recently that managing completely isolated HPC setups is becoming difficult. I agree - as I said above a lot of utilities and development flows assume a functioning Internet link. Look at the hooh-ha when web developers could not access a simple maths function which was removed from a repository (*)

Do we have good ideas for answering this problem?

A naive solution would be a shoebox sized portable server which contains the Internet.
Or at least the repositories which you need.
And yes the first person who says how can you check everything you hook up in the shoebox is secure and not tampered with is correct.
The shoebox would have to use the same key mechanisms as the original repositories.

(*) My opinion - why in the heck download a simple maths function every time?
I believe this was a twos complement function which could have been a single line of code.

1 Like

Are you thinking of the left-pad debacle?

(side talk about anaconda)

Cannot share recipes, but it’s pretty straightforward :

  • clone your favourite channels with conda-mirror (beware, it’s ok on first run then extra-sloooow, but you won’t ever get blacklisted if you use 1 instance of that tool)
  • after import procedures, push to web server (e.g. nginx),
  • rebuild the conda indexes (to fix the indexes for stuff that’s been destroyed during import procedures)
  • client side : deploy a “.condarc” pointing to your web server (e.g. from GPO if for windows clients, from /etc/skel if used from a linux server). You’ll have to figure out a way to push “.condarc” updates to users on linux systems.

That’s it !

It can absolutely be used but doesn’t really have much advantage over PkgServer for this purpose.

But yes, using the package server machinery is really the way to go for air gapping. There are some pieces of tooling missing to make this convenient, which could be solved by a package. It “just” requires someone with the knowledge and the need to implement it.

2 Likes

Could you explain the PkgServer’s role regarding the git repos it relies on ? I don’t really get it. Some kind of glue ? What’s the plus ?

Considering we’re not planing on hosting our own packages but are only interested in cloning existing Pkg repos.

And we already have a solution to distribute software artefacts to cover for users needs in terms of sharing various libs, execs & so on which they could use to share some home-made julia stuff.
My concern here is only about cloning existing repos, for offline use.