Potential performance regressions in Julia 1.8 for special un-precompiled type dispatches and how to fix them

There’s two facts that collide to make this matter more now. One is:

This means that if a precompile is missing in package X that is used/needed to precompile a call in package Y, it will now precompile with the ownership of package Y. This has 2 effects: one is that more precompilation will happen, two is that if package Z also needs the missing precompile, then package Y and package Z will precompile their own versions of the call from package X.

The more precompilation will increase load times but normally decrease first solve time, if types tend to match etc. But, it will increase load times more if a call is precompiled multiple times. The solution then is to try and precompile “what we know is needed” in package X, and use the system of external precompilation as sparingly as possible. It’s required to make things work (for example, Base misses precompilation of Vector(::Uninitiaialized,::Tuple) so oops you might need that), but don’t overrely on it.

The next is how SnoopPrecompile changes the game:

The main fact is that uninferred calls can now do precompilation effectively. Go back and re-read this issue in full:

That was the old situation. The issue was, if we can get inference to happen higher, then precompilation will happen on RecursiveFactorization.jl and that will send the first solve time with implicit methods from 22 seconds to 3. Now with SnoopPrecompile, the pre-changed version probably already hits 3 (on current release, it’s now 0.5 seconds BTW).

Basically, this means a lot more precompiles. But because a lot more precompiles, doubling precompiles hurts more. And invalidating functions hurts even more. If you do @time using OrdinaryDiffEq, the real important stat is 75% of the time is recompilation. This is invalidations taking what was precompiled and throwing it away because loading a different package (Static.jl, LoopVectorization.jl) invalidates the precompiled version.

So in the end, a lot more gets precompiled so the load time is increased (because of the ownership and non-inferred help), this does have a major improvement on the first solve time, but it increases using time, which then explodes because invalidations throw away more than a majority of that precompile work.

Therefore, invalidations matter a whole lot more now. It’s time to fix as much as we can there.

3 Likes