Is Claude right on this (a cache size growing analysis)?

I have been asking Clade to investigate why cache sizes grow so much with with not so much precompiled code. I’m showing here the results of a smaller example (but I have one where “acusations” are heavier when trying to explain a growth of 29 MB).

My question to compiler experts is basically, is this true?

tl:dr

Done. Everything is explained. Here's the summary:

  Root cause: "Stub" MethodInstances in Julia's sysimage

  Julia's sysimage (1.10) contains registered MethodInstances for eigen(Matrix{Float64}), cov(Matrix{Float64}), *(Matrix{Float64}, Matrix{Float64}), etc. — but
  they are empty stubs with no compiled code.

  This means that:
  1. Julia "knows" these specializations exist (type-dispatch resolves)
  2. But it has no native code nor serialized IR for them
  3. When any package calls eigen during precompilation, Julia must compile the entire chain (eigen → eigen! → LAPACK.syevr! → ccall wrappers → ...) from scratch
  4. The compiled code is then stored in the cache of the calling package (GMT), not in the sysimage

  This is why pca(rand(Float32, 24, 4)) adds ~3.8 MB — the entire cov + eigen + linear algebra chain is recompiled and serialized inside GMT's pkgimage.

  This is a Julia design problem, not GMT's. The stdlibs (LinearAlgebra, Statistics) should precompile these fundamental operations via PrecompileTools in their
  own cache/sysimage, but they don't. Every downstream package pays the full cost.

The longer analysis


## PCA Cache Bloat Investigation (2026-02-16)

### Problem

`pca(rand(Float32, 24, 4))` in the precompile block adds ~3.8 MB to the cache.

### Measurements

| Precompile config | Cache Size | Delta from baseline |
|---|---|---|
| No pca (baseline) | 94.99 MB | — |
| `pca(rand(Float64, 24, 4))` | 97.92 MB | +2.93 MB |
| `pca(rand(Float32, 24, 4))` | 98.78 MB | +3.79 MB |

### Root cause: "Stub" MethodInstances in Julia's sysimage

The Julia sysimage (1.10) contains **MethodInstances registered** for `eigen(Matrix{Float64})`,
`cov(Matrix{Float64})`, `*(Matrix{Float64}, Matrix{Float64})`, etc. — but they are **empty
stubs with NO compiled code**:

MethodInstance found: Tuple{typeof(eigen), Matrix{Float64}}
Has inferred code (cache): false

cov: Tuple{typeof(cov), Matrix{Float64}} → has code: false

  • : Tuple{typeof(*), Matrix{Float64}, Matrix{Float64}} → has code: false

The sysimage "knows" these specializations exist (type-dispatch resolves), but has **no native
code nor serialized IR** for them. The LAPACK internal functions (`eigen!`, `LAPACK.geev!`,
`LAPACK.syevr!`) have **zero** precompiled specializations.

When GMT calls `pca` → `princomp!` → `cov(X)` + `eigen(cov_X)` during precompilation, Julia
must compile the **entire call chain** from scratch:

pca → princomp! → cov(Matrix{Float32}) → eigen(Matrix{Float32}) → eigen!(Matrix{Float32})
→ LAPACK.syevr!(Float32, …) → ccall wrappers → …

  • sortperm(Vector{Float32}) + Matrix{Float32} * Matrix{Float32} + sum/mean/…

The compiled code is then stored in **GMT's pkgimage cache**, not in the sysimage.

Timing confirms this — 99.98% of execution time is compilation:

Float32 chain: 3.73s, 5.34M allocs, 357 MiB, 99.98% compilation
Float64 chain: 2.67s, 4.06M allocs, 271 MiB, 99.98% compilation


### Why Float64 is slightly cheaper than Float32

Using Float64 saves ~0.86 MB vs Float32 because other GMT precompile paths (grdimage, plot,
etc.) already trigger some Float64 linear algebra specializations. Float32 creates a completely
separate specialization chain (`Eigen{Float32,...}`, `cov → Matrix{Float32}`, etc.).

### This is a Julia design limitation

The stdlib packages (LinearAlgebra, Statistics) should precompile these fundamental operations
via PrecompileTools in their own cache/sysimage, but they don't. Every downstream package that
calls `eigen`, `cov`, or matrix operations pays the full compilation cost and stores the
entire LAPACK chain in its own pkgimage.

### Resolution

`pca` is commented out from the precompile block. It's a niche function — users who need PCA
pay ~4 seconds of compilation cost on first call, but GMT's cache shrinks by ~3.8 MB.

Could you share me some sample code?
I think I have an easy way to validate this :slight_smile:

Hi, thanks for looking at this.

Not fresh anymore how I was driving Claude at the time to investigate that, but a way to see the difference is to uncomment this line of the GMT.jl package. Just redid it and 3.7 MB added to the precompiled cache, which is huge for the amount of code that it adds.