I have an internal package I’m using that is particularly slow to load on 1.9-rc3, on an AWS docker image (built locally, then deployed). Basically it’s just there to bring in all dependencies with some coordinating functions, which I use for my Pluto notebooks generally.
When I run using MyPackage alone, it takes ~9 min to complete.
However, if I call import Pkg; Pkg.precompile(); using MyPackage, the whole thing takes ~3 min to complete.
Is the difference that the former isn’t doing anything in parallel? If not, why not?
Yeah, that’s right, when just doing using, it’s not parallel. Indeed, on the master branch of Julia, this has been changed, and both will be executed in parallel.
So why wasn’t it like that from the start? I don’t know. Perhaps it felt weird to spawn multiple Julia processes from Julia itself to handle precompilation.
Unfortunately, this only helps in the specific case when using a single package involves compiling many dependencies. If instead of using MyPackage there is
using PkgA
using PkgB
using PkgC
...
then it’ll still be slower than running ]precompile.
Because parallel precompilation is handled by Pkg, using is part of code loading which shouldn’t require Pkg at all. This has now been implemented with a Pkg hook, but the coupling between code loading and Pkg is still not considered ideal.
I took the liberty of editing the title, since the question isn’t really about loading but more about how precompilation behaves implicitly under using vs explicit precompile.
I think the word “loading” should be reserved for how long a package takes to load once it is already precompiled. I was quite alarmed by your number of 9min .
Fair! Though from a naive perspective, it all looks like it’s part of loading because there are no indicators of other activity.
One related confusion I had is why I couldn’t speed this up by precompiling during the Docker container setup, but it always needed to take this extra time on second run. I suppose it’s related to different CPU architecture of my machine vs. the Amazon machine?
Yes, we don’t support cross-compilation. It’s high on many people’s wish list but very nontrivial, so don’t hold your breath for it.
And as usual you raise an excellent point about there being no indication of what’s really happening. Assuming 1.10 retains the transition to parallel precompilation even for using, a fix is already in place. But if that gets reverted, we should probably add some visual indicator.
Can I ask you here for clarification? I‘m confused as to whether you are effectively saying that the —cpu-target flags are being ignored by precompilation.
Cross-compilation meant as compiling for a different architecture. With julia -C you can choose a different microarchitecture (or ISA, Instruction Set Architecture) within the same architecture, that’s not cross-compilation (or not too much at least).
Oh ok. That‘s good to hear. Most often in cloud deployment we encounter different generations of x86_64, that should work then? We are using cpu target flags and it seems to work so far.
We simply build a docker image in which we precompile with flags that may be more generic than the ones of the machine on which the docker image is built.
The idea is if we build with skylake-avx512, we still have code cached for skylake without avx512, and we can always fall back to the generic x86_64. There may also occasionally be older haswell CPUs.
The reason we haven’t been using system images so far is that for our app the time to first response was higher compared to using package images & PrecompileTools. I haven’t yet understood why.
But TBH I don’t understand much of it, and I tend to always copy-paste the CPU target specification string mentioned there, in the hope that whatever has been chosen to build the official releases of Julia will be sufficiently portable for my use cases: