Teaching Julia prerequisites

Can’t say I have ever seen container software in the modules of an HPC. I’ve just seen horrifyingly complex build systems and 20 versions of GCC.

But Julia fixes this by just making your Julia package work. No GCC versions, no this or that. Binaries should just work because BinaryBuilder is shipping a pre-packaged one. The only thing that really causes issues are things which use BinDeps or things which have Python dependencies (since that’s a Python problem, so we can’t really fix their package manager). So IMO the right thing to do is just to purge those, and now your manifest works as a “container”. And if your packages don’t work if a user does ]add Package, then fix it. If you just keep things on latest release and have constant CI going to update anything downstream, things should be fine.

3 Likes

Containers are getting used more and more in HPC these days, though. The obvious advantage is that they allow users to bring a complete and tested software stack to the computing site, making life easier for both users and site support/admin staff. True, Julia is very good at lessening the software installation pain, but often still are unwieldy application-specific binary dependencies, plus stuff like CUDA versions, etc. But there are also performance advantages: HPC-compatible container runtimes like Shifter and Singularity make the container available as a kind of read-only networked block storage. So file-system metadata operations and reading of small files (typically many) become very efficient, since each compute node handles this locally.

We always use containers for teaching - accessible as a Jupyter server. We do not allow students to submit work any other way. A huge advantage of this system is that (a) we reduce maintenance requests from students and (b) can use nbgrader without difficulties.

https://github.com/jameskermode/iatl-jupyter-launcher

This system is getting adopted across at least 4 departments for scientific computing modules using either Python or Julia.

Interesting, didn’t know about this. Do you have some links about this? I’m still on the “don’t use small files on a cluster” workflow.

1 Like

Both Shifter (NERSC-only) and Singularity (getting very popular) use SquashFS images for containers. So in contrast to Docker, there isn’t a layered file system stored as separate files on disk - it’s just a compressed and deduplicated Linux root fs image stored a single file. That container image is typically put on the site’s cluster file system (Lustre, GPFS, CephFS, etc.) which is already highly tuned to deliver large files to many nodes in parallel. The individual compute nodes loop-mount that SquashFS file, and all metadata operations become local. Also, since Linux caches the file in a block-wise fashion, small file access also becomes local, once the relevant blocks of the container image have been pulled (typically in larger chunks) from the cluster file system.

3 Likes

If students see “the reality” of Julia too early, they will never use Julia. It is too fragile of an environment. Docker you can tweak PackageCompiler for plots, give them a coherent set of packages that work (as some package is usually broken if you get latest) and eventually we can integrate mkl once we get our act together. The whole thing is very slick and usable, but…

However, there is a big issue: docker cannot be used on windows home edition at this point. Even my work laptop doesn’t support it unless I reinstall everything. This problem,wh is insurmountable in practice until WSL 2 is now available in Windows Insiders - Windows Command Line is integrated into windows. We will turn off docker support in the quantecon lectures because of that setup issue.

So I think the best approach is to do @Tamas_Papp suggests bad get a jupyterhub for a class, where you do a manifest instantiation for a coherent set of packages, and tell people to try to avoid upgrading them unless they really need to. You can even package compile plots into the image.

See our dockerfile for https://github.com/QuantEcon/syzygy-jupyterhub?files=1 to copy some of it for your jupyterhub.

1 Like

If you have a choice of operating system for the machines the students will learn on, I suggest Fedora/CentOS/Redhat. On these operating systems the installation of Julia is literally this simple:

sudo dnf install julia

I don’t have much else to contribute.

You are correct. As soon as you try to progress to a more sophisticated setup things get more painfull.

Fortunately there is an active project that recognizes these realities:

Unfortunately, Julia is not yet in a base stack nor one of the community stacks.

https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#community-stacks

That said the Jupyter project would seem to be a natural place to house such an effort.

At the very least you’ll see the subtle issues that stand between a docker “Hello World” and something realistic.

Update:

Note that this Dockerfile builds on top of the jupyter stacks (specifically jupyter/scipy-notebook:latest)- this would seem to be the most natural candidate to start a Julia base image from.

Our Windows computers have Jupyter already installed, so I just had them download and install the latest Julia, ]add IJulia, quit Julia, and launch Jupyter. This worked fine. You can give them a manifest if you want for packages. I was able to use the QuantEcon lectures with no problem, except instantiating does take a few minutes.

Docker is not installed, so not an option. I considered JuliaPro so they could have Juno, but the registration is kind of annoying. Also, installing JuliaPro takes longer than installing Julia and adding IJulia.

I considered distributing a Julia install and a pre-made .julia folder with all the packages, but I’m not sure if this would work reliably across different computers. Also, the .julia folder has to be manually placed in the home directory as far as I know, which is another step. It would be great if I could run Julia entirely off of a USB drive, but I don’t know if that’s possible.

It is: https://github.com/jupyter/docker-stacks/tree/master/datascience-notebook

The issue there is that the datascience-notebook has a bunch of R stuff in it, and the julia versions can get occasionally out of date. But the size doesn’t matter in many cases.

That said, the biggest issue really is the deployment on windows desktops. It just doesn’t work if that is a target audience

… and has been since 2015.>/blush> Must be getting old :wink: