Sharing precompiled code across docker images

I’m struggling with the precompilation time in julia inside docker image.
We have some projects in julia in docker, and we’re doing changes quite often to this project, but the packages that take most time to precompile (Flux etc.) change version only rarely.
I’m thinking whether there is a way to store the precompilation result and when new docker image is built, it would check if it has already been precompiled or not.
I know PackageCompiler states in docs that using sysimage would work only on the same machine it was created on, would it work in same docker image, so I could store the sysimage if it’s not created, and if it’s already stored somewhere, I would load it as part of docker build?
When I produce sysimage using PackageCompiler, and store these sysimages for various versions of julia and versions of packages if there a way to determine if I already have the sysimage ready for my versions of packages and I can load it instead of building it again?
Would conditioning sysimage by version of julia and versions of all packages I precompile there be enough to determine if it’s safe to load such sysimage because it’s been created with same versions of packages and thus should be the same?

In order to make this work, of course the sysimage would need to be built on same/similar OS and cpu (and gpu if there would be any) supporting same instructions, right?

Is PackageCompiler capable of producing multiple sysimage targets, so it would contain code for more architectures and the most performant would be selected at runtime?

As a low-hanging fruit I could just cache whole ~/.julia. But I hope there would be some handier way to work with it, but probably caching the ~/.julia seems like a low-hanging fruit.
And I think ~/.julia is too big, in the end I need only ~/.julia/compiled/<my version of julia> , right?
Each package in e.g. ~/.julia/compiled/v1.5 has several *.ji files, is there any way to check per-file/package what would be the resulting name of .ji file and fetch it if it’s been pre-computed, and if not, then precompile that package?

We’ve been doing this for a while now, although with a slightly different use case. We build sysimages on a cloud linux machine (in docker), and download them to our laptops (mix of intel Macs and Windows). We then build new docker images and copy in the downloaded sysimages on our laptops, and it has all worked well.

At the time, I thought we only managed to get it to work by setting cpu_target to x86_64, otherwise we had some errors when starting Julia. This is not necessarily true, and was possibly due to inexperience with PackageCompiler.jl. I haven’t had the time to test multiple targets/generic unfortunately.

More recently I’ve been trying to use this same images on an M1 Mac and now have to modify the docker build command that uses the download sysimage to specify the platform (--platform=linux/amd64). This makes use of docker’s emulation and seems to work, but I’m not 100% sure its error free. It also will probably hinder performance, as docker dashboard itself warns:

Screenshot 2021-09-16 at 09.23.17

Another possibility is multi-platform builds with Docker, so when you build your sysimage you could create sysimages for the various architectures that you need. I haven’t tested this, but this blog has some more info on this. I think you can also do some nice things with the docker manifest so that users automatically get the docker image that matches their architecture.

IMHO:

  • use latest julia ( >=1.6 )
    • FROM julia:1.6
  • and use ( or extend) default_app_cpu_target() so it will work on any X86 cpu
julia> PackageCompiler.default_app_cpu_target()
"generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)"
1 Like

Hate to revive an old post but I have exact same problem: a Julia project that is built and ran inside Docker with rarely changing dependencies but sysimage creation takes 20+ minutes in each build because we need to keep recompiling the same (unchanged) packages.

Did you find answers to you questions?

Does caching ~/.Julia folder works? Do you need to cache the whole thing or just the compiled sub-folder?

2 Likes

I also hate to revive old posts, but I’m also encountering the same issues. Have there been any improvements/solutions in the last year?

I’m not up-to-speed on this, never used Docker with Julia, but I seem to recall some change in 1.11 (or was it 1.12-DEV?), so I would at least try out the release candidate. It seems 1.11 will be released very soon.

I tried to look up why I might have seen. Everything mentioning Docker I see in code (comments) is years old, but there are still some recent merged PRs related to Docker.

I find several PRs (and issues) mentioning Docker, e.g. this one with a backport label to 1.11: