Proposed Julia Docker workflow to use a "persistent" depot

omus · April 26, 2023, 7:05pm

When developing a Julia library or application I typically find myself incrementally adding or modifying my package dependencies. If the particular Julia application also needs to be included in a Docker image this iterative workflow can be quite annoying as any change to the dependencies results in all of the dependencies having to be installed and precompiled from scratch. There seemed like a better approach was possible so after some research and discovering the RUN --mount=type=cache feature in Docker I came up with the following Dockerfile:

# syntax=docker/dockerfile:1
ARG JULIA_VERSION=1.8.5
FROM julia:${JULIA_VERSION}

# Switch the Julia depot to use the shared cache storage. As `.ji` files reference
# absolute paths to their included source files care needs to be taken to ensure the depot
# path used during package precompilation matches the final depot path used in the image.
# If a source file no longer resides at the expected location the `.ji` is deemed stale and
# will be recreated.
RUN ln -s /tmp/julia-cache ~/.julia

# Install Julia package registries.
RUN --mount=type=cache,sharing=locked,target=/tmp/julia-cache \
    julia -e 'using Pkg; Pkg.Registry.add("General")'

# Disable automatic package precompilation. We'll control when packages are precompiled.
ENV JULIA_PKG_PRECOMPILE_AUTO "0"

# Instantiate the Julia project environment and precompile dependencies.
ENV JULIA_PROJECT /project
COPY Project.toml Manifest.toml ${JULIA_PROJECT}/
RUN --mount=type=cache,sharing=locked,target=/tmp/julia-cache \
    julia -e 'using Pkg; Pkg.instantiate(); Pkg.precompile(strict=true)'

# Copy the shared ephemeral Julia depot into the image and remove any installed packages
# not used by our Manifest.toml.
RUN --mount=type=cache,target=/tmp/julia-cache \
    rm ~/.julia && \
    mkdir ~/.julia && \
    cp -rp /tmp/julia-cache/* ~/.julia && \
    julia -e 'using Pkg, Dates; Pkg.gc(collect_delay=Day(0))'

Using this approach the Julia registries, packages, and precompilation files are stored in a Docker cache which persists between image builds. The end result is that iterative package development results in much faster image builds as only the missing packages need to be added and precompiled, just like how local development works.

Additionally, this cache is shared between all Docker image builds so this can also help accelerate workflows where multiple Dockerfiles and Julia Docker images need to be built. That said, through some experimentation it is possible that concurrent Docker image builds can result in file access collisions so I decided to use sharing=locked to avoid running into these problems even though they seem rare in practice. The downside of sharing=locked is that concurrent builds will be slower than if we used sharing=shared but should still be faster than building all dependencies from scratch.

Let me know if this approach to building Docker applications is useful for your workflow. Maybe I’ll try to add this as documentation in docker-library/julia if this is useful.

okartal · December 20, 2023, 6:40am

Hi, thanks for the Dockerfile. It seems to work well and I struggled a while with this. I have a case where I instantiate the project but than add a line to start Julia with a sysimage generated with PackageCompiler. The sysimage is around 800 MB but the Docker image turns out to be 4 GB; that seems too big in my opinion. The sysimage needs the artifacts in .julia/ but still, I have the impression that I can reduce the size. Did you observe similar blow-ups or do you have another Dockerfile setup for deploying a Julia image with pre-compiled sysiamges?

goretkin · January 30, 2024, 2:39am

I have a similar use case. I’d like for the compilation cache to be preserved across different docker run calls, so I am persisting the ~/.julia folder in the container with the following argument to docker run

--mount type=bind,source=~/.julia_docker,target=/home/user_in_container/.julia

this has the added benefit of also preserveing the Julia REPL history across docker runs.

However, it breaks a lot of the hermetic and reproducibility benefits of Docker, because it also persists other state, such as package installs.

What is the recommended way to preserve the REPL history and compilation cache without any other state? Could I persist only DEPOT/logs and DEPOT/compiled?

omus · March 13, 2024, 9:24pm

My experience with PackageCompiler and making Julia sysimages is similar in that they really increase the image size. I haven’t been using sysimages as much recently but I tended only to use them when making a final image. It may be worth doing some performance testing to ensure that your image is seeing an actual benefit with the sysimage as package precompilation has gotten much better.

omus · March 13, 2024, 9:38pm

I also looked into this and ultimately didn’t go with this approach as reproducible image builds were important to me.

Could I persist only DEPOT/logs and DEPOT/compiled?

You’d also want to persist DEPOT/artifacts. Note that logs contains a “manifest_usage.toml” file which can result in Pkg.gc not cleaning up packages which is probably important as this single Julia depot could be shared across multiple images builds.

omus · March 13, 2024, 9:51pm

I’ve iterated on my original design here to utilize separate Docker cache’s for each Dockerfile. Doing this allows for Julia images to use stacked depots and utilize COPY --from to build off of parent images. I can post an update here if there is interest.

Additionally, I have yet to experiment with Julia’s 1.11 change which addresses precompile file relocatability. That change should simplify the Dockerfile considerably.

goretkin · March 14, 2024, 12:40am

The approach I’m taking now is with these docker run flags

    julia_volumes = (
        # We want to persist some but not all of the julia depot across `docker run`.
        # We do not want to persist new package installations.
        # We do want to persist compilation cache, initializing it with the contents in the docker container
        # to take advantage of the compilation work done at `docker build` time.
        "--mount type=volume,source=foocontainer_julia_cache_artifacts,target=/opt/.julia/artifacts "
        "--mount type=volume,source=foocontainer_julia_cache_compiled,target=/opt/.julia/compiled "
        "--mount type=volume,source=foocontainer_julia_cache_packages,target=/opt/.julia/packages "
        "--mount type=volume,source=foocontainer_julia_cache_registries,target=/opt/.julia/registries "
        # Persist the Julia REPL history
        "--mount type=bind,source=~/dev_docker_persistencce/.julia/logs,target=/opt/.julia/logs "
    )

(I also have the depot directory not be in a user directory, because the docker user matches the host user.)

I haven’t run into issues yet, but I realize this is probably not orthodox.

@omus, could you please elaborate on

I also looked into this and ultimately didn’t go with this approach as reproducible image builds were important to me.

mkitti · March 14, 2024, 2:00am

Use the depot stack? See the DEPOT_PATH variable in Julia or use the JULIA_DEPOT_PATH environment variable.

https://docs.julialang.org/en/v1/manual/environment-variables/#JULIA_DEPOT_PATH

https://docs.julialang.org/en/v1/base/constants/#Base.DEPOT_PATH

Only the first depot is writable. The others should be read only.

omus · March 14, 2024, 4:20pm

@omus, could you please elaborate on

I also looked into this and ultimately didn’t go with this approach as reproducible image builds were important to me.

Using a bind mount to re-use the Julia depot between the host and Docker containers can be problematic as the depot’s environment manifests (e.g $DEPOT/environments/v1.9/Manifest.toml) can result in unneeded packages being left in the Docker image even after running Pkg.gc(). If you Docker image builds with multiple architectures Julia will end up removing .ji for other platforms which can result in more pre-compilation churn. Finally, there is a time cost to transferring the Docker context and a large Julia depot shared across multiple build containers can be quite slow.

Utilizing a volume mount like you can work for sharing a Julia depot across running containers. However, I believe volume mounts aren’t a supported option when building a container. Additionally, the volume approach requires a pre-build step which I wanted to avoid which is why I ended up utilizing cache mounts.

My specific use case was focused on baking a Julia depot into an built image while keeping the image size and build time to a minimum.

xlxs4 · August 27, 2024, 12:43pm

There is interest! Could you post an update? Thanks!

benz0li · August 27, 2024, 1:44pm

@xlxs4 You are using b-data’s/my JupyterLab Julia docker stack, right?

The images are built as proposed in Proposed Julia Docker workflow to use a "persistent" depot - #8 by mkitti.

I.e. preinstall and and precompile packages at JULIA_DEPOT_PATH=${JULIA_PATH}/local/share/julia.

Cross references:

benz0li · August 27, 2024, 2:02pm

To install additional packages in the base image:

FROM glcr.b-data.ch/jupyterlab/julia/base

ARG DEBIAN_FRONTEND=noninteractive

## Switch to root
USER root

ENV HOME=/root

WORKDIR ${HOME}

RUN export JULIA_DEPOT_PATH=${JULIA_PATH}/local/share/julia \
  ## Determine JULIA_CPU_TARGETs for different architectures
  ## https://github.com/JuliaCI/julia-buildkite/blob/main/utilities/build_envs.sh
  && dpkgArch="$(dpkg --print-architecture)" \
  && case "${dpkgArch}" in \
    amd64) export JULIA_CPU_TARGET="generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1);x86-64-v4,-rdrnd,base(1)" ;; \
    arm64) export JULIA_CPU_TARGET="generic;cortex-a57;thunderx2t99;carmel,clone_all;apple-m1,base(3);neoverse-512tvb,base(3)" ;; \
    *) echo "Unknown target processor architecture '${dpkgArch}'" >&2; exit 1 ;; \
  esac \
  ## Install Packages
  && julia -e 'using Pkg; Pkg.add(["Package1", "Package2", "Package3"]); Pkg.precompile()' \
  ## Make installed packages available system-wide
  && julia -e 'using Pkg; Pkg.add(readdir("$(ENV["JULIA_DEPOT_PATH"])/packages"))' \
  && rm -rf ${JULIA_DEPOT_PATH}/registries/* \
  ## Clean up
  && rm -rf /tmp/* \
    ${HOME}/.julia

## Switch back to ${NB_USER} to avoid accidental container runs as root
USER ${NB_USER}

ENV HOME=/home/${NB_USER}

WORKDIR ${HOME}

Topic		Replies	Views
Advice on working with a "persistent" Julia Docker image General Usage	1	450	November 1, 2019
Recommended recipe for deploying a Julia app in Docker with efficient precompilation? General Usage docker	3	2128	October 10, 2024
Using Julia in a restricted environment with Docker General Usage	9	2948	August 16, 2018
Building Docker image on a machine and deploying on another invalidates precompile cache Tooling	3	131	November 27, 2024
Does Julia support installing package from cache? Internals & Design package	4	620	February 19, 2023

Proposed Julia Docker workflow to use a "persistent" depot

Related topics