I asked a question previously about managing incremental builds of Julia Docker containers but didn’t hear much back. Since then, I’ve been sketching out some ideas that I’d appreciate feedback on.
The Problem
Deploying Julia to the cloud (AWS in my case) poses some challenges.
Base images with facilities to interact with the cloud provider don’t have Julia installed. Julia images don’t have the facilities installed. Creating a cached provider image + Julia image requires nontrivial infrastructure work.
In a similar vein, you don’t really want to precompile your project’s whole dependency chain on every deploy. It is very time consuming and usually redundant. But caching that result is, again, nontrivial.
Right now, redeploying my project takes a very long time because it downloads Julia (I think this download gets cached fortunately), downloads my project dependencies, and precompiles the whole thing. My team uses Serverless Framework, but it isn’t easy declare via CloudFormation “Please rebuild a base image if my dependencies are updated, and then use that to build my new project image.”
The (Potential) Solution
I am considering a package that has the following facilities:
Build a <base image of your choice> + <your current julia version> image.
Build a PackageCompiler.jl compiled image of your current project’s dependencies built off of #1 iff your Manifest has changed.
Build a current project state image which copies your project code into #2, runs any precompile commands you have, and adds an entry point.
Functions which facilitate storing and retrieving these images from the image registry of your choice.
Potentially, some CLI facilities so that this could be managed by CodeBuild/Github Actions/etc.
Questions
Does this already exist somewhere that I’m missing?
Do you foresee any pitfalls that I might be missing (I would call my DevOps skills intermediate, so I’m sure I have some blind spots).
I’ve only really worked in AWS, and for my purposes, would be focused on ECR and CodeBuild, but I’d love for this to be extensible to whatever environment people might use. Are there any quirks I should pay attention to make sure the API is cloud agnostic?
FYI, we use CodeCatalyst and CDK (wrapper around CloudFormation). CodeCatalyst invokes CDK which creates a Lambda function via awslambda.NewDockerImageFunction, where the Dockerfile does all the building. I see that there is a CodeCatalyst action to invoke CodeBuild, but I’m not sure that we’d want to add that service. It’s more likely that we’d migrate toward Github Actions in the future. So to take advantage of your package I expect I’d probably still invoke it from the Dockerfile, or would that negate the benefits? Maybe I’d need a custom CodeCatalyst action.
As I’m currently imagining it, the pseudo code would look something like
function create_docker_file(docker_file_path:String, instructions::Vector{String})
docker_file_body = join("\n", instructions)
open(docker_file_path, "w") do f
write(docker_file_body)
end
return docker_file_path
end
function create_image(docker_file_path:String, image_id::String)
`docker build $docker_file_path --tag $image_id`
return image_id
end
# this basic logic would apply to the other steps as well
function build_julia_base(cloud_base_image, docker_file_dir, registry; julia_version=VERSION)
image_id = "base_image:$(hash(cloud_base_image))_$julia_version"
if image_exists(registry, image_id)
return image_id
end
instructions = [
"FROM $cloud_base_image",
# shell commands for downloading and installing julia
]
docker_file_name = "Dockerfile.$(replace(image_id, ":" => "_"))"
docker_file_path = joinpath(docker_file_dir, docker_file_name)
create_docker_file(docker_file_path, instructions)
create_image(docker_file_path, image_id)
put_image(registry, image_id)
end
# we could move registry to the type domain so that users could
# dispatch these functions for their environment
function put_image(registry, image_id) end
function image_exists(registry, image_id) end
With the caveat that I’ve only played around with it locally and not used it for anything in production, it seems like it may address some of the same functionality.
A lot of tooling for it got pushed into AWS.jl and other JuliaCloud packages.
It seems when the company shutdown we did dump out a lot (but not all) of the related configuration into open source repos: