Issues with Julia 1.12.2 Docker with GAP.jl

Hello, I’m relatively new to Docker and Julia so I would appreciate any advice.

I am trying to create a Docker container for Julia 1.12.2 with CUDA.jl, Oscar.jl (and hence GAP.jl), and some other libraries. Namely, I want to be able to run this Julia package in a Docker container on a cluster.

I am experiencing issues trying to get GAP.jl working. Here is my current Dockerfile:

ARG IMAGE=nvidia/cuda:12.1.1-devel-ubuntu20.04
FROM $IMAGE

RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive \
    apt-get install --yes --no-install-recommends \
                    # basic stuff
                    curl ca-certificates vim git gap gap-extra gap-doc && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

ARG JULIA_RELEASE=1.12
ARG JULIA_VERSION=1.12.2
RUN curl -s -L https://julialang-s3.julialang.org/bin/linux/x64/${JULIA_RELEASE}/julia-${JULIA_VERSION}-linux-x86_64.tar.gz | \
    tar -C /usr/local -x -z --strip-components=1 -f -

RUN mkdir -p /usr/local/share/julia/environments/v${JULIA_RELEASE}
COPY Project.toml /usr/local/share/julia/environments/v${JULIA_RELEASE}/Project.toml
COPY Manifest.toml /usr/local/share/julia/environments/v${JULIA_RELEASE}/Manifest.toml
COPY src /usr/local/share/julia/environments/v${JULIA_RELEASE}/src
COPY --from=gf . /usr/local/share/julia/dev/GPUFiniteFieldMatrices

RUN JULIA_DEPOT_PATH=/usr/local/share/julia julia -e 'using Pkg; \
    Pkg.develop(path="/usr/local/share/julia/dev/GPUFiniteFieldMatrices"); \
    Pkg.instantiate(); \
    Pkg.precompile();'

ENV JULIA_DEPOT_PATH=/usr/local/share/julia
COPY startup.jl /usr/local/share/julia/config/

RUN mkdir -m 0777 /data
ENV JULIA_HISTORY=/data/logs/repl_history.jl

That is, it is based off an image with CUDA available, where I then install Julia, copy relevant packages not on the registry, precompile, and then move the files to data/ so that users with non-admin privileges (e.g. on the Kubernetes cluster) can still read and write the necessary Julia files.

Attached is the startup,jl file:

if !isdir("/data/environments/v$(VERSION.major).$(VERSION.minor)")
    mkpath("/data/environments")
    cp("/usr/local/share/julia/environments/v$(VERSION.major).$(VERSION.minor)",
       "/data/environments/v$(VERSION.major).$(VERSION.minor)")
end
pushfirst!(DEPOT_PATH, "/data")

It is at this last step where I encounter issues. Running julia to launch the REPL, then using Oscar, I get:

Error, GAP is not bound in Julia at /usr/local/share/julia/packages/GAP/UXSq3/pkg/JuliaInterface/gap/JuliaInterface.gi:48 called from
Scratch := julia.GAP.eval( ValueGlobal( "JuliaEvalString" )( ":(import Scratch; Scratch)" ) ); at /usr/local/share/julia/artifacts/eac0e07d9389a9a61d6e38f612b9394d26e87cd5/atlasrep-2.1.9/gap/userpref.g:118 called from
record.default(  ) at /usr/local/share/julia/artifacts/8db0b2a1c30ae8f18d09d8f93b7cb5b1b359722d/share/gap/lib/userpref.g:285 called from
<function "DeclareUserPreference">( <arguments> )
 called from read-eval loop at /usr/local/share/julia/artifacts/eac0e07d9389a9a61d6e38f612b9394d26e87cd5/atlasrep-2.1.9/gap/userpref.g:166
ERROR: InitError: GAP variable _JULIAINTERFACE_ERROR_BUFFER not bound
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:44
  [2] getproperty(::GAP.GlobalsType, name::Symbol)
    @ GAP /usr/local/share/julia/packages/GAP/UXSq3/src/globals.jl:53
  [3] copy_gap_error_to_julia()
    @ GAP /usr/local/share/julia/packages/GAP/UXSq3/src/GAP.jl:65
  [4] initialize(argv::Vector{String})
    @ GAP /usr/local/share/julia/packages/GAP/UXSq3/src/GAP.jl:147
  [5] __init__()
    @ GAP /usr/local/share/julia/packages/GAP/UXSq3/src/GAP.jl:275
  [6] run_module_init(mod::Module, i::Int64)
    @ Base ./loading.jl:1440
  [7] register_restored_modules(sv::Core.SimpleVector, pkg::Base.PkgId, path::String)
    @ Base ./loading.jl:1428
  [8] _include_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String, depmods::Vector{Any}; register::Bool)
    @ Base ./loading.jl:1316
  [9] _include_from_serialized
    @ ./loading.jl:1271 [inlined]
 [10] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt128, stalecheck::Bool; reasons::Dict{String, Int64}, DEPOT_PATH::Vector{String})
    @ Base ./loading.jl:2099
 [11] _require_search_from_serialized
    @ ./loading.jl:2006 [inlined]
 [12] __require_prelocked(pkg::Base.PkgId, env::String)
    @ Base ./loading.jl:2624
 [13] _require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base ./loading.jl:2490
 [14] macro expansion
    @ ./loading.jl:2418 [inlined]
 [15] macro expansion
    @ ./lock.jl:376 [inlined]
 [16] __require(into::Module, mod::Symbol)
    @ Base ./loading.jl:2383
 [17] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:2359
 [18] top-level scope
    @ REPL[1]:1
during initialization of module GAP

julia> 

GAP is the only package that has this issue. CUDA.jl, for example, works perfectly fine.
I’m at a loss at how to a) debug this further and b) what the best practices are for this.

I’ve tried to manually move all the files from shared to data, precompiled GAP.jl alone without other things, but I have not made any meaningful progress. That is, even alone GAP.jl produces the above error when no other package I tested does.

I would really appreciate any advice on this topic!

EDIT: I might have been too fast with answering. I understood that only loading GAP already poses a problem. The Dockerfile below successfully loads GAP, but your error might depend on more details of your docker build.
I now added parts of your apt-get command, but I couldn’t locate gap-extra?

Here’s a Dockerfile that works, it’s a slightly adapted version of the Dockerfile that I use for hosting Genie Apps, that’s why the user is called ‘genie’.
There are a couple of lines rather useful for Genie applications or for apps that use PythonCall. I just left them as they are, as people reading this might be interested.
There are several caching layers to improve building times…

ARG APP="GAPDocker"
ARG IMAGE=nvidia/cuda:12.1.1-devel-ubuntu20.04

#################################################################
# 1. Builder Stage – Installation and Compilation of Dependencies
#################################################################
FROM julia:1.12 AS builder

ARG APP
ENV APP=${APP}

RUN useradd --create-home --shell /bin/bash genie

# setting up the app's directory
RUN mkdir /home/genie/${APP} && chown genie:genie /home/genie/${APP}
WORKDIR /home/genie/${APP}

USER genie
ENV JULIA_DEPOT_PATH="/home/genie/.julia"

ENV JULIA_PKG_PRECOMPILE_AUTO="0"

# cache installation of Pkg
RUN julia --threads=auto -e "using Pkg"

# copy minimal setup and write a mocking module for better caching
# I tend to not copy the Manifest.toml because I like to test various julia versions
# If you always work with the same jullia version on your dev computer and in docker,
# you can also copy the Manifest.toml here for better reproducibility
COPY --chown=genie:genie Project.toml ./
RUN mkdir src && echo "module ${APP} end" > src/${APP}.jl

# instantiate and precompile dependencies
RUN julia --threads=auto --project -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()'

# if you are using PythonCall, instantiate CondaPkg by `using PythonCall`
# RUN julia --threads=auto --project -e 'using PythonCall'

# now copy the full source → changes here do not invalidate the cached instantiate
COPY src src
# COPY data data
# COPY public public
# COPY app.jl app.jl

#################################################################
# 2. Runtime Stage –        Final Image for Running the App
#################################################################
FROM ${IMAGE} AS runtime

ARG APP
ENV APP=${APP}

# create dedicated user
RUN useradd --create-home --shell /bin/bash genie

# copy the julia installation
COPY --from=builder /usr/local/julia /usr/local/julia
RUN ln -s /usr/local/julia/bin/julia /usr/local/bin/julia


# copy the app directory
COPY --from=builder --chown=genie:genie /home/genie /home/genie
    
#  ----------------------  app settings -------------------------------

# use "prod" for final deployment and "dev", "qa", "test" during testing/development
ENV GENIE_ENV="dev"
ENV JULIA_REVISE="on"

# set up app environment
ENV PORT="8000"
EXPOSE ${PORT}

# ----------------------- custom dependencies ---------------------------

# install other dependencies here, e.g. for image processing or databases
RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive \
    apt-get install --yes --no-install-recommends \
                    # basic stuff
                    curl ca-certificates vim git gap gap-doc && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# ----------------------------  setup  ----------------------------------

WORKDIR /home/genie/${APP}
USER genie

ENV JULIA_DEPOT_PATH="/home/genie/.julia"
ENV GENIE_HOST="0.0.0.0"

# if you use PythonCall or other Python packages
ENV PYTHONUNBUFFERED="1"

# instantiate Julia packages with only config files copied for better caching

# set CondaPkg mode to offline to avoid rebuilds of Conda
ENV JULIA_CONDAPKG_OFFLINE="true"

# precompile without serving to reduce memory usage during build
RUN julia --threads=auto --project -e 'using Pkg; Pkg.precompile()'

# set an environment variable to indicate that we are running inside Docker
ENV DOCKER="true"

# run app
ENTRYPOINT ["julia"]
CMD ["--project", "--threads=auto", "-e", "using GAPDocker; wait()"]

This Dockerfile supports a package-like structure and calling the app via using GAPDocker, if the project toml has a name and a uuid, e.g.

name = "GAPDocker"
uuid = "83b44333-0fdd-49ce-8951-e1a22757abc1"
authors = ["hhaensel"]
version = "0.1.0"

[deps]
GAP = "c863536a-3901-11e9-33e7-d5cd0df7b904"

file ‘src/GAPDocker’

module GAPDocker

using GAP

function __init__()
    @info "Initializing GAPTest.jl"
end

end

build command

docker build -t gapdocker .

run command

docker run -t gapdocker

debugging

docker run -it gapdocker --project

followed by entering using GAPDocker at the REPL

P.S.: you have a COPY --from=gf is that another build stage and what does it contain then?