Lessons learnt and thoughts on deploying Julia and sysimages on AWS

I have spent a portion of my time on and off over the past two weeks trying to build docker containers with sysimages to deploy on AWS. Over that time I have run into various issues and thought it worthwhile writing up my lessons learnt. I hope this is useful to someone else out there.

This is my first time working with docker with more involvement than trivial examples or running a simple container. It was also my first time having to be more aware of what/how AWS works. Although I’ve learnt a lot from this experience, I remain a solid amateur.

Many thanks to those who have contributed to other posts and discussions, they were helpful in working out what I was doing wrong.

This post in particular was a starting point from which, following breadcrumbs and googling similar terms, I finally was able to resolve most of my issues:

Lesson 1

Ideally docker containers should be as small as possible.
But currently Julia itself is heavy - up to a GB without any additional packages installed. An unused package dependency can add several hundred or even GBs. On top of that, it seems that increasing the amount of code that needs to be compiled into sysimages adds significantly to the total time and memory required to complete sysimage compilation.

I found it worthwhile using Aqua.jl to identify packages that could be removed from the project/environment.

using <your package>
using Test
using Aqua

@testset "Aqua" begin
    Aqua.test_stale_deps(<your package>)
end

Check and remove the ones Aqua complains about (careful, it’s not always correct).

In the end, the docker image I ended up with was ~3.5GB, down from ~4.5GB, and the sysimage ended up being ~827MB.

Lesson 2

Still on reducing overall container size, there are alternative base images that you could look into. But in my (limited) experience, it’s not really worth doing so.

The official images have Julia pre-installed. The one based on Alpine Linux is much smaller, but comes with the caveat that there may not be identical Linux packages available for it (or at least they use different names). I tried them in an attempt to reduce container size but saw several Julia packages would fail to precompile, so in the end I gave up.
A more *nix-savvy person may have a better experience.

I also tried alternatives such as bitnami/minideb - Docker Image | Docker Hub and installing Julia via juliaup. In my specific case, I did not see a huge reduction in overall container size (Julia and Julia packages taking up the most space) for the additional setup complexity and decrease in maintainability, so I went back to the official base image.

Lesson 3

Julia can compile (or not compile) code for different architectures (the physical hardware on which code is run). This is defined by setting a JULIA_CPU_TARGET environment variable (see this discussion). Code can be compiled generically at the cost of performance, so using appropriate CPU targets is important to maximize speed.

The issue is that we’re not sure what architectures are run on AWS.
And no, just because it runs on your computer does not mean it will run fine when deployed on AWS.

I eventually settled on the following, which is roughly in line with the generic Julia binaries:

ENV JULIA_CPU_TARGET="generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1);x86-64-v4,-rdrnd,base(1);znver4,-rdrnd,base(1)"

I don’t think these are the most appropriate targets to use, they just worked for me.

Lesson 4

Sysimages don’t package everything up.

If your dependencies download and install Artifacts as part of their setup, then you will likely run into issues with Julia crashing when running with a sysimage.

For me, OpenSSL_jll and MKL_jll caused the most headaches, leading to errors similar to the one below:

InitError(mod=:IntelOpenMP_jll, error=ErrorException("Artifact "IntelOpenMP" was not installed correctly.

In my case, MKL_jll was being installed by a redundant package so I could simply remove that dependency. But OpenSSL_jll is a requirement for AWS.jl, so couldn’t be easily fixed.

Turns out, the environment that will use the sysimage needs a near-identical project to the one that created the sysimage, otherwise it will not work.
I found this out through the discussion on this PackageCompiler issue.

In my particular case, our resident Cloud Infra expert (shout out to Peydar) suggested splitting the Dockerfiles into two: one for creating the sysimage, another for generating the worker image. The sysimage file could then be stored on AWS EFS and shared across worker containers.

Below is an example snippet of the Dockerfile commands used to get things working.

# Copy target project to image
# The script that creates the sysimage is inside `src`
COPY Project.toml Manifest*.toml ./
COPY src/ src/

# Create environment and build sysimage
# This command in the Dockerfile that creates the sysimage
RUN julia --project=@app -e \
    'using Pkg; \
    Pkg.add("PackageCompiler"); \
    Pkg.develop(PackageSpec(path=pwd())); \
    Pkg.instantiate(); \
    Pkg.precompile();'

# This command in the Dockerfile for creating images for deployment
# Note that this is identical to the above, except PackageCompiler is not added
RUN julia --project=@app -e \
    'using Pkg; \
    Pkg.develop(PackageSpec(path=pwd())); \
    Pkg.instantiate(); \
    Pkg.precompile();'

# This line only for the Dockerfile responsible for generating the sysimage
# `sysimage.jl` is the script used to create a sysimage.
RUN julia --project=@app -e 'include("src/sysimage.jl")'

Final Thoughts (and a question)

As of 2025, Julia for the Cloud has a long way to go. There must be several tricks that could be applied that are not openly documented for, say JuliaHub, to run efficiently.

On the size of the docker images and sysimage, I knew that Julia binaries can get quite big so I was expecting this. What I didn’t expect is how much ram the compilation process needs. In my case upward of 28GB can be used while compiling the sysimage and while I can compile it in a somewhat reasonable timeframe on my laptop (~9-12 mins), it can take several hours on GitHub Actions. I can definitely appreciate the view of certain cloud engineers who recommend not using Julia for these reasons.

The deployment story/experience could definitely be improved as the above effectively means we’re relegated to manual and local processes to deploy updates, which is not ideal.

I am keen to hear how others have resolved this.

Hopefully Julia v1.12 / v1.13 will go some way of resolving all the above.

Again, immensely grateful to those who have written up their experiences and contributed to discussions on sysimages and deploying Julia in the cloud (both credited and uncredited in this write up).

16 Likes