The compilation process actually continues all the way down to native machine code. On the first call:
@code_lowered
: Code is first lowered from Julia AST into Julia SSA-form IR.
@code_typed
: Type inference figures out (or gives up on if your code is unstable) the types of everything inside the function. This helps Julia deduce the return type and decide what specific methods to call when your function body calls other functions.
@code_llvm
: Julia prepares the LLVM IR for your method.
@code_native
: LLVM compiles code down to native machine code, and this is cached within your Julia session so that code is fast on the second run.
Precompilation covers the first two stages, so that’s why it doesn’t completely eliminate latency on the first call either. When a precompiled method is invoked for the first time, the compiler will just pick up the work from typed code and continue down to machine code. Theoretically we could cache machine code as well and achieve zero latency, but there are subtle technical points to consider.
One, type stability is still vital for precompilation to work well. Consider a hypothetical function:
function foo(x)
temp = Any[x]
bar(temp[1])
end
Here type inference cannot figure out what is passed into bar
, so the function when compiled will still have to wait till runtime and fallback onto dynamic dispatch. If the package author decides to precompile foo(::Int)
, Julia won’t precompile bar(::Int)
because it doesn’t know dynamic dispatch will eventually resolve to that specific method during runtime. On the first call of foo(2)
then, Julia will still have to compile bar(::Int)
, and if sufficiently complicated and time-consuming, compiling bar(::Int)
will dominate the compilation time cost and precompilation wouldn’t help. The fix here is to simply write code as type stable as possible, or if dynamic dispatch is necessary, to precompile bar(::Int)
separately so that when the runtime type is known, Julia doesn’t have to start from scratch.
Two, Julia’s flexibility also represents an issue. Since method tables are mutable global states, loading modules and adding methods can invalidate and force recompilation of other methods. In such a case, all of the precompilation work will just be thrown out and the first call to an invalidated method would just trigger the entire compilation process again. While package developers can now diagnose and patch invalidations, it doesn’t change the fact that even if we finally cache native code during precompilation, the performance benefits are only there if the programmer writes their code carefully.
Finally, there’s also the interesting technical question of where precompilation should be saved when package composability comes into the picture. Currently each package has their own *.ji
to save precompiled code, but if the function bar
in the example above belongs to another package, then where should we save our precompiled work? If we cache it in our package’s file, then what if another package comes around and also caches the same method, how do we resolve the conflicts when we load them together? If bar
is cached in the package it comes from instead, then loading this package will always load your specific bar
method, even if it’s only useful when used with your module. Perhaps precompilation cache should work on a per-environment basis, but then a lot of shared precompilation work wouldn’t be reused across environments.
To be clear all of these problems don’t make precompilation improvement infeasible. Type stability is emphasized a lot in the Manual, and tools are available to help prevent invalidations. The technicalities of precompilation caching could be handled if there are enough developer hours to invest in it. But someone needs to take the initiative and the core devs already have a lot on their plate.
PackageCompiler is already able to save your session into a sysimage so you don’t have to pay the cost of first-call compilation, and can cache method compilation from precompile files and the package’s test suite. But for the process to be worry-free and automatic just by entering ] precompile
into the REPL, we’d just have to wait and see how our community rises to the task.