I’m curious about the module loading process. Recently, I’ve noticed a shift in recommendations regarding package loading. In the past, there was a focus on incorporating packages like Makie and DifferentialEquations into the system image. Nowadays, it seems there’s more reliance on the precompilation cache.
In my personal experience, I’ve found that creating a custom system image can significantly speed up module loading times. This observation naturally have lead me to wonder: why is using a custom system image faster than simply relying on the precompilation cache? To take this to an extreme, why does Julia need a system image at all if the precompilation cache exists?
While I might be missing something obvious here, I haven’t been able to find a clear explanation for this. (In particular, I asked ChatGPT on this and the answer was rather dogmatic.) Are there any detailed write-ups or documentation that delve into these differences? I’m generally curios in understanding when to prefer one method over the other.
The world is indeed shifting beneath your feet: system images have always included native machine code. Before Julia 1.9, precompile caches did not. That changed with Julia 1.9:
A package’s precompilation cache, a pkgimage, needs to be loaded, or integrated into the existing state. The system image is the initial deserialized state. Both now contain native code.
Ok, now I see why images are faster, as they are a snapshot of the Julia’s system state. I would like to know a little bit more about what Julia does when it creates a system state from pkgimages. Additionally, what exactly is stored in the system state? How is it organized within Julia?
Your mental model about system state is just about perfect — it’s really everything that Julia knows about: types, modules/bindings and their (serializable) values, and of course methods with their specializations (typed, IR, native code, etc.).
As far as how it’s organized, it depends on the category and how deep into internals you want to go. At the Julia-level, types are subtypes(Any), the names of bindings are names, values are accessed from the bindings, methods and their specializations are in MethodTables accessed from methods.
Incrementally loading a package’s precompile cache needs to track through changes as they happen, e.g., doing subtyping to appropriately insert methods, checking for invalidations, etc.
This really is the biggest difference between system images and package images.
The system image state is always valid, it is the first thing that exists and thus it doesn’t need to perform any work to make sure that things are consistent – by definition they are.
Package images (nee precompilcation caches) have the challenge that they are incremental, they are partial serialization of state and so when we load them we need to perform work to ensure that loading them leads to a consistent state.
We now have decent tooling with Tracy to inspect what is happening during the package loading process.
Package images (nee precompilcation caches) have the challenge that they are incremental, they are partial serialization of state and so when we load them we need to perform work to ensure that loading them leads to a consistent state.
Can the construction of a system image state be made simpler if one assumes that the end state derived from given pkgimages is consistent? In such a situation, what operations would still be necessary?
I mean, I suppose if you wanted to re-introduce julia#265, you could theoretically try ignoring invalidations and world ages. But that’d make things work very strangely (and probably crashy) once loaded.
Even without that, you’d always need to do the full subtyping to search for where to insert methods since method tables are sorted from most specific to least, and the location of a particular method depends upon what’s already there.
Those are just two aspects of the loading that I’m vaguely familiar with. There’s surely more considerations like the above.
Can the construction of sysimage be made very very simple like this?
using Plot,Pkg,Revise
println("Hello World")
savecursysimage("mysysimage001")
This would make the startup time very very quick as we know what needs to be in the state of Julia’s memory at startup time by using our customize sysimage at startup time.