V1.9-rc3 - faster precompilation with explicit `precompile` call?

BioTurboNick · May 4, 2023, 2:22pm

I have an internal package I’m using that is particularly slow to load on 1.9-rc3, on an AWS docker image (built locally, then deployed). Basically it’s just there to bring in all dependencies with some coordinating functions, which I use for my Pluto notebooks generally.

When I run using MyPackage alone, it takes ~9 min to complete.

However, if I call import Pkg; Pkg.precompile(); using MyPackage, the whole thing takes ~3 min to complete.

Is the difference that the former isn’t doing anything in parallel? If not, why not?

jakobnissen · May 4, 2023, 2:28pm

Yeah, that’s right, when just doing using, it’s not parallel. Indeed, on the master branch of Julia, this has been changed, and both will be executed in parallel.
So why wasn’t it like that from the start? I don’t know. Perhaps it felt weird to spawn multiple Julia processes from Julia itself to handle precompilation.

aplavin · May 4, 2023, 7:47pm

Unfortunately, this only helps in the specific case when using a single package involves compiling many dependencies. If instead of using MyPackage there is

using PkgA
using PkgB
using PkgC
...

then it’ll still be slower than running ]precompile.

giordano · May 4, 2023, 11:08pm

Because parallel precompilation is handled by Pkg, using is part of code loading which shouldn’t require Pkg at all. This has now been implemented with a Pkg hook, but the coupling between code loading and Pkg is still not considered ideal.

tim.holy · May 5, 2023, 11:51am

I took the liberty of editing the title, since the question isn’t really about loading but more about how precompilation behaves implicitly under using vs explicit precompile.

I think the word “loading” should be reserved for how long a package takes to load once it is already precompiled. I was quite alarmed by your number of 9min .

BioTurboNick · May 5, 2023, 12:08pm

Fair! Though from a naive perspective, it all looks like it’s part of loading because there are no indicators of other activity.

One related confusion I had is why I couldn’t speed this up by precompiling during the Docker container setup, but it always needed to take this extra time on second run. I suppose it’s related to different CPU architecture of my machine vs. the Amazon machine?

tim.holy · May 5, 2023, 12:17pm

Yes, we don’t support cross-compilation. It’s high on many people’s wish list but very nontrivial, so don’t hold your breath for it.

And as usual you raise an excellent point about there being no indication of what’s really happening. Assuming 1.10 retains the transition to parallel precompilation even for using, a fix is already in place. But if that gets reverted, we should probably add some visual indicator.

ianshmean · May 6, 2023, 12:17pm

I’m confused. Are you not seeing the “Info: Precompiling…” message during this 9 minute period after using?

AFAIK that always shows if there’s any regular precompilation taking place

BioTurboNick · May 6, 2023, 1:06pm

Not in the output lines displayed by AWS CloudWatch, pulled from the running container.

I do see it in the terminal when I manually run the container on AWS, connected by SSH.

EDIT: But I also do see the precompilation spinner lines when I explicitly call precompile.

simsurace · May 9, 2023, 9:35pm

Can I ask you here for clarification? I‘m confused as to whether you are effectively saying that the —cpu-target flags are being ignored by precompilation.

giordano · May 9, 2023, 11:32pm

Cross-compilation meant as compiling for a different architecture. With julia -C you can choose a different microarchitecture (or ISA, Instruction Set Architecture) within the same architecture, that’s not cross-compilation (or not too much at least).

simsurace · May 10, 2023, 8:26am

Oh ok. That‘s good to hear. Most often in cloud deployment we encounter different generations of x86_64, that should work then? We are using cpu target flags and it seems to work so far.

BioTurboNick · May 10, 2023, 10:51am

May I ask how you’re using it? Maybe that would work for me too, beacuse that sounds similar to mine.

simsurace · May 11, 2023, 11:37am

We simply build a docker image in which we precompile with flags that may be more generic than the ones of the machine on which the docker image is built.

ENV JULIA_CPU_TARGET=x86_64;haswell;skylake;skylake-avx512

The idea is if we build with skylake-avx512, we still have code cached for skylake without avx512, and we can always fall back to the generic x86_64. There may also occasionally be older haswell CPUs.

The reason we haven’t been using system images so far is that for our app the time to first response was higher compared to using package images & PrecompileTools. I haven’t yet understood why.

BioTurboNick · May 11, 2023, 12:13pm

Thanks I’ll try that! How do you determine what targets to specify? Is this dark art documented somewhere?

ffevotte · May 11, 2023, 12:50pm

The only piece of documentation that I know about this is here:

https://docs.julialang.org/en/v1/devdocs/sysimg/#Specifying-multiple-system-image-targets

But TBH I don’t understand much of it, and I tend to always copy-paste the CPU target specification string mentioned there, in the hope that whatever has been chosen to build the official releases of Julia will be sufficiently portable for my use cases:

generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)

BioTurboNick · May 11, 2023, 6:07pm

Oh sweet, specifying this worked. I’m going to suggest the default JULIA_CPU_TARGET string be added to the official Docker images.

Topic		Replies	Views
RFC: Speeding up code loading when using multiple processes by 2X Internals & Design	5	814	April 17, 2020
Why loading a Package is slow (Julia 1.6) Performance package , startup	1	1083	July 18, 2021
Why is my package precompiled on every startup? New to Julia	10	2894	February 15, 2021
Any way to speed up loading large precompiled packages? General Usage precompilation , ttfp , ttfx	35	3138	May 15, 2023
Why does precompilation take so long for this no-op module? General Usage	2	576	June 9, 2020

V1.9-rc3 - faster precompilation with explicit `precompile` call?

Related topics