Is Claude right on this (a cache size growing analysis)?

I have been asking Clade to investigate why cache sizes grow so much with with not so much precompiled code. I’m showing here the results of a smaller example (but I have one where “acusations” are heavier when trying to explain a growth of 29 MB).

My question to compiler experts is basically, is this true?

tl:dr

Done. Everything is explained. Here's the summary:

  Root cause: "Stub" MethodInstances in Julia's sysimage

  Julia's sysimage (1.10) contains registered MethodInstances for eigen(Matrix{Float64}), cov(Matrix{Float64}), *(Matrix{Float64}, Matrix{Float64}), etc. — but
  they are empty stubs with no compiled code.

  This means that:
  1. Julia "knows" these specializations exist (type-dispatch resolves)
  2. But it has no native code nor serialized IR for them
  3. When any package calls eigen during precompilation, Julia must compile the entire chain (eigen → eigen! → LAPACK.syevr! → ccall wrappers → ...) from scratch
  4. The compiled code is then stored in the cache of the calling package (GMT), not in the sysimage

  This is why pca(rand(Float32, 24, 4)) adds ~3.8 MB — the entire cov + eigen + linear algebra chain is recompiled and serialized inside GMT's pkgimage.

  This is a Julia design problem, not GMT's. The stdlibs (LinearAlgebra, Statistics) should precompile these fundamental operations via PrecompileTools in their
  own cache/sysimage, but they don't. Every downstream package pays the full cost.

The longer analysis


## PCA Cache Bloat Investigation (2026-02-16)

### Problem

`pca(rand(Float32, 24, 4))` in the precompile block adds ~3.8 MB to the cache.

### Measurements

| Precompile config | Cache Size | Delta from baseline |
|---|---|---|
| No pca (baseline) | 94.99 MB | — |
| `pca(rand(Float64, 24, 4))` | 97.92 MB | +2.93 MB |
| `pca(rand(Float32, 24, 4))` | 98.78 MB | +3.79 MB |

### Root cause: "Stub" MethodInstances in Julia's sysimage

The Julia sysimage (1.10) contains **MethodInstances registered** for `eigen(Matrix{Float64})`,
`cov(Matrix{Float64})`, `*(Matrix{Float64}, Matrix{Float64})`, etc. — but they are **empty
stubs with NO compiled code**:

MethodInstance found: Tuple{typeof(eigen), Matrix{Float64}}
Has inferred code (cache): false

cov: Tuple{typeof(cov), Matrix{Float64}} → has code: false

  • : Tuple{typeof(*), Matrix{Float64}, Matrix{Float64}} → has code: false

The sysimage "knows" these specializations exist (type-dispatch resolves), but has **no native
code nor serialized IR** for them. The LAPACK internal functions (`eigen!`, `LAPACK.geev!`,
`LAPACK.syevr!`) have **zero** precompiled specializations.

When GMT calls `pca` → `princomp!` → `cov(X)` + `eigen(cov_X)` during precompilation, Julia
must compile the **entire call chain** from scratch:

pca → princomp! → cov(Matrix{Float32}) → eigen(Matrix{Float32}) → eigen!(Matrix{Float32})
→ LAPACK.syevr!(Float32, …) → ccall wrappers → …

  • sortperm(Vector{Float32}) + Matrix{Float32} * Matrix{Float32} + sum/mean/…

The compiled code is then stored in **GMT's pkgimage cache**, not in the sysimage.

Timing confirms this — 99.98% of execution time is compilation:

Float32 chain: 3.73s, 5.34M allocs, 357 MiB, 99.98% compilation
Float64 chain: 2.67s, 4.06M allocs, 271 MiB, 99.98% compilation


### Why Float64 is slightly cheaper than Float32

Using Float64 saves ~0.86 MB vs Float32 because other GMT precompile paths (grdimage, plot,
etc.) already trigger some Float64 linear algebra specializations. Float32 creates a completely
separate specialization chain (`Eigen{Float32,...}`, `cov → Matrix{Float32}`, etc.).

### This is a Julia design limitation

The stdlib packages (LinearAlgebra, Statistics) should precompile these fundamental operations
via PrecompileTools in their own cache/sysimage, but they don't. Every downstream package that
calls `eigen`, `cov`, or matrix operations pays the full compilation cost and stores the
entire LAPACK chain in its own pkgimage.

### Resolution

`pca` is commented out from the precompile block. It's a niche function — users who need PCA
pay ~4 seconds of compilation cost on first call, but GMT's cache shrinks by ~3.8 MB.

Could you share me some sample code?
I think I have an easy way to validate this :slight_smile:

Hi, thanks for looking at this.

Not fresh anymore how I was driving Claude at the time to investigate that, but a way to see the difference is to uncomment this line of the GMT.jl package. Just redid it and 3.7 MB added to the precompiled cache, which is huge for the amount of code that it adds.

Had a quick look and Claude seems to be correct on that the increase in memory comes from additional precompiled functions. Uncommenting that line causes 144 additional functions to get precompiled.

You can validate it yourself if you want: Export of additonal precompiled functions

I am using a instrumentation layer we are developing for Julia to get this data called CodeGlass. To much to go into details here (will probably make a topic in Tooling about it soon) but
this can collect all kind of stuff including what functions are precompiled in packages, so I made one of GMT with and without Float32 and made an export of the difference.

These should be all the functions, but keep in mind it is still in development :slight_smile:

3 Likes

So, to be clear: The alleged design fault in julia is the following:


Package A exports function foo, but does not precompile method instance foo(x::T), in this example cov(::Matrix{Float32}).

Package B depends on A, and causes precompilation of foo(x::T). Package C depends on A and not on B, and also causes precompilation of foo(x::T).

Then foo gets precompiled twice, and stored twice, in the precompile cache for B and C.

This is bad design, and can cause a (polynomial, not exponential, right?) overhead.

Good design would store the precompilation of foo in the highest (closest to Core) package possible, i.e. in the unique package where a method specialization foo(x::T) would not be type piracy (i.e. the package in the dependency graph where foo / T are defined), together with a reference when it is needed (i.e. to packages B/C, as a “foreign backedge”).

Then, we can reuse this precompilation (e.g. cov(::Matrix{Float32})) in every package that needs it; and not load it / garbage collect it, once it is not needed.

So, in the good design, when precompiling B we must modify the precompile cache for A.


Have I correctly read between the lines what claude is alleging here? (claude’s comment that “A should precompile f(::T)” taken literally is nonsense)

Is this really the state of affairs? I have not looked into precompilation implementation details; but I would have assumed that we follow the “good design” outlined above, and not the “bad design” alleged by claude.

2 Likes

Sorry, I don’t get this one. If the method is already compiled in A why must we modify it?

The method and its argument types are defined in A, but the specific methodinstance (method + argument type combo) are not precompiled in A.

But B and C independently require that thing (eg cov(::Matrix{Float32})).

Hence, the precompilation of that methodinstance cannot be triggered by precompilation of A (because A does not forsee that this combination is needed, and cannot see into the future). It must be triggered by B / C.

But the precompilation should ideally happen only once, by whatever B or C happens to be included first; hence it needs to be stored in a place where both B and C can find it, which would be A. (packages B and C know nothing about each other!)

I’d like to note that this is not entirely trivial to get right. Especially because dispatch happens on the level of type equality ==, but validity of precompiled code is on the level of ===. So the data layout of Vector{NTuple{10, Union{Missing, Int}}} depends on compilation order. (is this a represented as a tuple of unions or as a union of tuples?)

One needs to be very careful that precompile caches cannot be polluted by packages that are not loaded (typical profile pollution issues). I am not sure how this is currently handled.

Otherwise you get into shitty to debug territory (…oh, when I precompile package A before package B, then the performance of package C deteriorates, even if package A is not loaded)

The idea is that A’s cache didn’t precompile a call that belongs to it, so when B or C independently do, they should store the call in A’s cache instead of storing their own copies. What I’ve seen in the PrecompileTools docs implies otherwise at “whether they belong to your package or not”. I don’t really know what “belong” means there; a call signature often mixes types from any number of modules and dependencies, and the method and callee methods could originate in various dependencies e.g. what does Base.sum(::A.Range{B.Num}) belong to if it dispatches to sum(r::AbstractRange{<:Real}) defined in Base and calls Base.first(r::Range) defined in A then Base.:+(x::Num, y::Num) defined in B?

I’m not seeing anything here that corroborates that, even though I think it happens generally. We would need to see duplicated native code in multiple package images, but we’re only talking about GMT’s so far. The suggestion is to associate the native code with the empty MethodInstances in the system image so GMT doesn’t have to.

The native code and its MethodInstance aren't one-to-one, so it's not surprising a MethodInstance could be empty. I'm pretty sure a CodeInstance is one-to-one, but it doesn't necessarily reference native code either.
julia> foo(a, b) = a+b
foo (generic function with 1 method)

julia> methods(foo)[1].specializations # no MethodInstance yet
svec()

julia> Base.return_types(foo, (Int, Int)) # type inference
1-element Vector{Any}:
 Int64

julia> methods(foo)[1].specializations # has a MethodInstance now
MethodInstance for foo(::Int64, ::Int64)

julia> methods(foo)[1].specializations.cache # has a CodeInstance too
CodeInstance for MethodInstance for foo(::Int64, ::Int64)

julia> methods(foo)[1].specializations.cache.rettype # has the return type
Int64

julia> methods(foo)[1].specializations.cache.specptr # but no native code
Ptr{Nothing}(0x0000000000000000)

julia> precompile(foo, (Int, Int))
true

julia> methods(foo)[1].specializations.cache # same CodeInstance
CodeInstance for MethodInstance for foo(::Int64, ::Int64)

julia> methods(foo)[1].specializations.cache.specptr # now there's native code
Ptr{Nothing}(0x0000020c4e1eff70)

I don’t think affecting a dependency’s cache is actually good, whether precompilation automatically tweaks caches or we manually add to precompile workloads. It would make sense if precompilation was cached per active environment, but precompilation is cached per package, up to some number of distinct instances. The same instance could be loaded by any number of compatible environments, so if one environment contains A, B or A, C and adds to the A cache, then another environment that contains only A has to load more code. Considering that A, D or A, E or A, F could each add distinct calls to A’s cache, A’s cache scales with the number of active environments, and that could only be mitigated if deleting active environments triggered fresh precompilation of A, which doesn’t sound good. A would be the system image in this particular case, and increasing that would be infeasible. The expense of PackageCompiler suggests it’s not a quick tweak, and my hunch is it wouldn’t be a quick tweak for package images either.

My prior on gen ai getting this sort of subtle internal behavior right is abysmally low, but this seems right. A better source is in PkgCacheInspector.jl:

Finding duplicated specializations

Two “downstream” packages can force identical specializations of the same “upstream” method. In such cases, there may be opportunities to reduce loading time by moving some of the precompilation upstream.

4 Likes

Would it be possible to share the full list of GMT compiled functions? (guess it’s a big one)

Yea sure, I have it in the office so i will send you it tomorrow :slight_smile:

1 Like

Yea it was 8213 functions big… haha

This CSV contains all precompiled functions that are added by “using GMT”, so it includes dependencies GMT uses.
A markdown table is also below it, but Gist does not even want to show it…

You should also just download the file (or press raw) as Gist only loads rows up to 4108

All Functions of GMT and dependencies.csv

Thanks. It had to be big :slight_smile: On Windows the dll has ~90 MB, though a good part of it is only 0’s.

Let me know if you need any more info :slight_smile:

I can for example also give you the profiler out put of all those functions (Call durations, allocations etc) if you give me some sample code to execute.

Thanks for the offer but I already have here a lot to get busy. My main curiosity is to find multi compiled methods and the why. But when I see that Julia itself compiles hundreds of sort’s, ierator’s, Printf’s …

1 Like

Would knowing which function calls each function help in figuring this out?

Well, certainly yes but I’m afraid this would be really explosive in terms function numbers. When I see the reports of JET that explosion is an horror, so with your tool that cascade should be present as well. Note, I don’t want to sound ungrateful but I fear this is a very deep hole I would have to dive in.

1 Like