Understanding Module Loading: System Image vs. Precompilation Cache

Janis_Erdmanis · December 11, 2023, 11:52am

Hi everyone

I’m curious about the module loading process. Recently, I’ve noticed a shift in recommendations regarding package loading. In the past, there was a focus on incorporating packages like Makie and DifferentialEquations into the system image. Nowadays, it seems there’s more reliance on the precompilation cache.

In my personal experience, I’ve found that creating a custom system image can significantly speed up module loading times. This observation naturally have lead me to wonder: why is using a custom system image faster than simply relying on the precompilation cache? To take this to an extreme, why does Julia need a system image at all if the precompilation cache exists?

While I might be missing something obvious here, I haven’t been able to find a clear explanation for this. (In particular, I asked ChatGPT on this and the answer was rather dogmatic.) Are there any detailed write-ups or documentation that delve into these differences? I’m generally curios in understanding when to prefer one method over the other.

mbauman · December 11, 2023, 2:13pm

The world is indeed shifting beneath your feet: system images have always included native machine code. Before Julia 1.9, precompile caches did not. That changed with Julia 1.9:

mkitti · December 11, 2023, 2:52pm

A package’s precompilation cache, a pkgimage, needs to be loaded, or integrated into the existing state. The system image is the initial deserialized state. Both now contain native code.

https://docs.julialang.org/en/v1/devdocs/sysimg/

The native code in pkgimages can also be used to build a systrm image as of Julia 1.10 [citation needed].

Janis_Erdmanis · December 11, 2023, 4:27pm

Ok, now I see why images are faster, as they are a snapshot of the Julia’s system state. I would like to know a little bit more about what Julia does when it creates a system state from pkgimages. Additionally, what exactly is stored in the system state? How is it organized within Julia?

mbauman · December 11, 2023, 5:45pm

Your mental model about system state is just about perfect — it’s really everything that Julia knows about: types, modules/bindings and their (serializable) values, and of course methods with their specializations (typed, IR, native code, etc.).

As far as how it’s organized, it depends on the category and how deep into internals you want to go. At the Julia-level, types are subtypes(Any), the names of bindings are names, values are accessed from the bindings, methods and their specializations are in MethodTables accessed from methods.

Incrementally loading a package’s precompile cache needs to track through changes as they happen, e.g., doing subtyping to appropriately insert methods, checking for invalidations, etc.

vchuravy · December 11, 2023, 8:26pm

This really is the biggest difference between system images and package images.
The system image state is always valid, it is the first thing that exists and thus it doesn’t need to perform any work to make sure that things are consistent – by definition they are.

Package images (nee precompilcation caches) have the challenge that they are incremental, they are partial serialization of state and so when we load them we need to perform work to ensure that loading them leads to a consistent state.

We now have decent tooling with Tracy to inspect what is happening during the package loading process.

Janis_Erdmanis · December 11, 2023, 9:42pm

Package images (nee precompilcation caches) have the challenge that they are incremental, they are partial serialization of state and so when we load them we need to perform work to ensure that loading them leads to a consistent state.

Can the construction of a system image state be made simpler if one assumes that the end state derived from given pkgimages is consistent? In such a situation, what operations would still be necessary?

mbauman · December 11, 2023, 9:59pm

I mean, I suppose if you wanted to re-introduce julia#265, you could theoretically try ignoring invalidations and world ages. But that’d make things work very strangely (and probably crashy) once loaded.

Even without that, you’d always need to do the full subtyping to search for where to insert methods since method tables are sorted from most specific to least, and the location of a particular method depends upon what’s already there.

Those are just two aspects of the loading that I’m vaguely familiar with. There’s surely more considerations like the above.

StevenSiew · December 11, 2023, 11:40pm

Can the construction of sysimage be made very very simple like this?

using Plot,Pkg,Revise

println("Hello World")
savecursysimage("mysysimage001")

This would make the startup time very very quick as we know what needs to be in the state of Julia’s memory at startup time by using our customize sysimage at startup time.

mkitti · December 12, 2023, 4:35am

You can do the following.

julia> using PackageCompiler

julia> PackageCompiler.create_sysimage(; sysimage_path = "mysysimage001.so");

Topic		Replies	Views
What's the difference between `PackageCompiler.create_sysimage()` and `Pkg.precompile()`? New to Julia precompilation , package-compiler	7	935	October 9, 2022
Why would a sysimage be faster than a package (JuliaScript.jl)? New to Julia package-compiler , sysimage	4	257	July 13, 2024
V1.9-rc3 - faster precompilation with explicit `precompile` call? Internals & Design	16	1251	May 11, 2023
Why loading a Package is slow (Julia 1.6) Performance package , startup	1	1084	July 18, 2021
A prototype of `pkgimage` binary cache system for reducing latency General Usage	3	1243	August 12, 2022

Understanding Module Loading: System Image vs. Precompilation Cache

Related topics