Creating a Docker Base Image for Faster Deployments

In my experience, deploying Julia to the cloud can be frustrating because it can be slow to build the container used to your app. I don’t seem to be the only one. A major gripe is that AWS CodeBuild can’t do layer caching in a VPC, so strategies for rearranging your DockerFile to benefit from cache don’t help.

I finally got a working solution for me. I build a base image with a SysImage of my dependencies and then copy my current src directory on deployment. Posting it here in case it is helpful for others. It comes in four parts

1. A Julia script for building a system image within the Docker container
using Pkg
# Install PackageCompiler in the base environment
Pkg.activate(@__DIR__)
Pkg.instantiate()
Pkg.add("PackageCompiler")
Pkg.build("PackageCompiler")
using PackageCompiler

# Activate the Project Environment
# Build the sysimage
PackageCompiler.create_sysimage(;
    sysimage_path="/usr/local/julia/bin/julia_base.so",
    cpu_target="generic",
    sysimage_build_args=`-O3`
)
2. A DockerFile to build the base image
FROM public.ecr.aws/amazonlinux/amazonlinux:latest

ENV JULIA_CPU_TARGET=generic

# Download and install Julia
WORKDIR /usr/local
RUN yum -y groupinstall "Development Tools"
RUN yum install -y tar gzip 
RUN curl -LO https://julialang-s3.julialang.org/bin/linux/x64/1.10/julia-1.10.5-linux-x86_64.tar.gz 
RUN tar xf julia-1.10.5-linux-x86_64.tar.gz 
RUN rm julia-1.10.5-linux-x86_64.tar.gz 
RUN ln -s julia-1.10.5 julia

# Use a special depot path to store precompiled binaries
ENV JULIA_DEPOT_PATH=./.julia

COPY Project.toml .
COPY Manifest.toml .
COPY create_sys_image.jl .
# Only copying src right because package compiler expense a 
# wellformed package for the active project.
ADD src  src/.

RUN /usr/local/julia/bin/julia create_sys_image.jl -t auto -O3 --startup-file=no --heap-size-hint=6G
3. A bash script to orchestrate building the base image when you change your dependencies

Note that for this script to work, you need to have Docker Engine running and 8GB of RAM allocated to it. You also need to be logged into AWS.

I don’t use other image hosting services (i.e. DockerHub), but I’m sure this script could be amended pretty easily to work with whatever solution you use.

docker build -f Dockerfile.base --tag julia_base:latest . --shm-size 8gb

aws ecr get-login-password --region <YOUR REGION> --profile <YOUR AWS PROFILE NAME> | \
    docker login --username AWS --password-stdin <YOUR ECR URL>

docker tag juila_base:latest <YOUR ECR URL>/<YOUR ECR REPO>:julia_base
docker push <YOUR ECR URL>/<YOUR ECR REPO>:julia_base
4. The DockerFile to use in deployment I have heap size hint set to 12GB since my Fargate task is allocated 16GB of RAM. I notice that sometime Julia thinks that the RAM available is actually the host machine's total RAM, not what is allocated to the container. Set it accordingly for your use case.
FROM <YOUR ECR URL>/<YOUR ECR REPO>:juila_base
ADD src  src/.
COPY entry_point_script.jl .
RUN /usr/local/julia/bin/julia \
    --sysimage /usr/local/julia/bin/juila_base.so \
    --sysimage-native-code=yes \
    -t auto \
    --project=. \
    -O3 \
    --startup-file=no \
    -e 'using Pkg;Pkg.instantiate()'
ENTRYPOINT /usr/local/julia/bin/julia \
    --sysimage /usr/local/julia/bin/juila_base.so \
    --sysimage-native-code=yes \
    -t auto \
    --project=. \
    -O3 \
    --startup-file=no \ 
    --heap-size-hint=12G \
    entry_point_script.jl

Note that all these files assume they are at the root of your project directory.

Thanks to @oxinabox and others for your help along the way!

UPDATE: I discovered that, even if you copy in a new src/ directory, if you import your project package with using MyPackage it will pick up the version that PackageCompiler.jl cached in the sys image. So it’s best to have your entry_point_script.jl use include("src/MyPackage.jl"), so that you are using the fresh version of your package.

11 Likes

My Debian (x86_64) based - Dockerfile fragment :

imho: The key difference might be that I tried to install base Julia and the Julia packages in one step, to keep the final Docker image as small as possible.

FROM debian:bookworm-backports

.....

# Since 1.9.0 Julia, the CPU target is set to "native" by default.
# This settings avoids the need to compile the Julia packages for the specific CPU architecture of the host machine
# Make sure the image can be used on any x86_64 machine by setting JULIA_CPU_TARGET
# to the same value used by the generic julia binaries, see
# https://github.com/JuliaCI/julia-buildkite/blob/4b6932992f7985af71fc3f73af77abf4d25bd146/utilities/build_envs.sh#L23-L31
ENV JULIA_CPU_TARGET="generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1);x86-64-v4,-rdrnd,base(1);znver4,-rdrnd,base(1)"

ENV JULIA_MAJOR=1.11
ENV JULIA_VERSION=1.11.0
ENV JULIA_SHA256=bcf815553fda2ed7910524c8caa189c8e8191a40a799dd8b5fbed0d9dd6b882c
ENV JULIA_DIR=/usr/local/julia
ENV JULIA_PATH=${JULIA_DIR}
ENV JULIA_DEPOT_PATH=${JULIA_PATH}/local/share/julia

RUN set -eux \
    && mkdir ${JULIA_DIR} \
    && cd /tmp  \
    && wget -q https://julialang-s3.julialang.org/bin/linux/x64/${JULIA_MAJOR}/julia-${JULIA_VERSION}-linux-x86_64.tar.gz \
    && echo "$JULIA_SHA256 julia-${JULIA_VERSION}-linux-x86_64.tar.gz" | sha256sum -c - \
    && tar xzf julia-${JULIA_VERSION}-linux-x86_64.tar.gz -C ${JULIA_DIR} --strip-components=1 \
    && rm /tmp/julia-${JULIA_VERSION}-linux-x86_64.tar.gz \
    && ln -fs ${JULIA_DIR}/bin/julia /usr/local/bin/julia \
    \
    && julia -e 'using Pkg; Pkg.add(["PackageCompiler","Arrow","ClickHouse","CpuId","CSV","DataFrames","DuckDB","JSON3","LibPQ","Parquet2","PyCall","SQLite","XLSX"]);Pkg.precompile()' \
    && julia -e 'using CpuId, Arrow, ClickHouse, CSV, DataFrames, DuckDB, JSON3, LibPQ, Parquet2, PyCall, SQLite, XLSX;' \
    && julia -e 'using InteractiveUtils; versioninfo()'

....