Is Claude right on this (a cache size growing analysis)?

I have been asking Clade to investigate why cache sizes grow so much with with not so much precompiled code. I’m showing here the results of a smaller example (but I have one where “acusations” are heavier when trying to explain a growth of 29 MB).

My question to compiler experts is basically, is this true?

tl:dr

Done. Everything is explained. Here's the summary:

  Root cause: "Stub" MethodInstances in Julia's sysimage

  Julia's sysimage (1.10) contains registered MethodInstances for eigen(Matrix{Float64}), cov(Matrix{Float64}), *(Matrix{Float64}, Matrix{Float64}), etc. — but
  they are empty stubs with no compiled code.

  This means that:
  1. Julia "knows" these specializations exist (type-dispatch resolves)
  2. But it has no native code nor serialized IR for them
  3. When any package calls eigen during precompilation, Julia must compile the entire chain (eigen → eigen! → LAPACK.syevr! → ccall wrappers → ...) from scratch
  4. The compiled code is then stored in the cache of the calling package (GMT), not in the sysimage

  This is why pca(rand(Float32, 24, 4)) adds ~3.8 MB — the entire cov + eigen + linear algebra chain is recompiled and serialized inside GMT's pkgimage.

  This is a Julia design problem, not GMT's. The stdlibs (LinearAlgebra, Statistics) should precompile these fundamental operations via PrecompileTools in their
  own cache/sysimage, but they don't. Every downstream package pays the full cost.

The longer analysis


## PCA Cache Bloat Investigation (2026-02-16)

### Problem

`pca(rand(Float32, 24, 4))` in the precompile block adds ~3.8 MB to the cache.

### Measurements

| Precompile config | Cache Size | Delta from baseline |
|---|---|---|
| No pca (baseline) | 94.99 MB | — |
| `pca(rand(Float64, 24, 4))` | 97.92 MB | +2.93 MB |
| `pca(rand(Float32, 24, 4))` | 98.78 MB | +3.79 MB |

### Root cause: "Stub" MethodInstances in Julia's sysimage

The Julia sysimage (1.10) contains **MethodInstances registered** for `eigen(Matrix{Float64})`,
`cov(Matrix{Float64})`, `*(Matrix{Float64}, Matrix{Float64})`, etc. — but they are **empty
stubs with NO compiled code**:

MethodInstance found: Tuple{typeof(eigen), Matrix{Float64}}
Has inferred code (cache): false

cov: Tuple{typeof(cov), Matrix{Float64}} → has code: false

  • : Tuple{typeof(*), Matrix{Float64}, Matrix{Float64}} → has code: false

The sysimage "knows" these specializations exist (type-dispatch resolves), but has **no native
code nor serialized IR** for them. The LAPACK internal functions (`eigen!`, `LAPACK.geev!`,
`LAPACK.syevr!`) have **zero** precompiled specializations.

When GMT calls `pca` → `princomp!` → `cov(X)` + `eigen(cov_X)` during precompilation, Julia
must compile the **entire call chain** from scratch:

pca → princomp! → cov(Matrix{Float32}) → eigen(Matrix{Float32}) → eigen!(Matrix{Float32})
→ LAPACK.syevr!(Float32, …) → ccall wrappers → …

  • sortperm(Vector{Float32}) + Matrix{Float32} * Matrix{Float32} + sum/mean/…

The compiled code is then stored in **GMT's pkgimage cache**, not in the sysimage.

Timing confirms this — 99.98% of execution time is compilation:

Float32 chain: 3.73s, 5.34M allocs, 357 MiB, 99.98% compilation
Float64 chain: 2.67s, 4.06M allocs, 271 MiB, 99.98% compilation


### Why Float64 is slightly cheaper than Float32

Using Float64 saves ~0.86 MB vs Float32 because other GMT precompile paths (grdimage, plot,
etc.) already trigger some Float64 linear algebra specializations. Float32 creates a completely
separate specialization chain (`Eigen{Float32,...}`, `cov → Matrix{Float32}`, etc.).

### This is a Julia design limitation

The stdlib packages (LinearAlgebra, Statistics) should precompile these fundamental operations
via PrecompileTools in their own cache/sysimage, but they don't. Every downstream package that
calls `eigen`, `cov`, or matrix operations pays the full compilation cost and stores the
entire LAPACK chain in its own pkgimage.

### Resolution

`pca` is commented out from the precompile block. It's a niche function — users who need PCA
pay ~4 seconds of compilation cost on first call, but GMT's cache shrinks by ~3.8 MB.