Mystery of invalidations in V>1.10. Even in precompiled code

I have reported before the very large degradation in TTFP in GMT for version 1.11 and later. One example as reminder

  | | |_| | | | (_| |  |  Version 1.10.10 (2025-06-27)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/ 

julia> @time @eval plot(rand(5,2))
  2.415636 seconds (8.13 M allocations: 550.066 MiB, 5.43% gc time, 99.86% compilation time: 24% of which was recompilation)


  | | |_| | | | (_| |  |  Version 1.13.0-DEV.1065 (2025-08-29)
 _/ |\__'_|_|_|\__'_|  |  Commit 3396562934 (4 days old master)
|__/                   |

julia> @time @eval plot(rand(5,2))
  6.430101 seconds (24.71 M allocations: 1.307 GiB, 3.36% gc time, 99.95% compilation time: 99% of which was recompilation)

I also want to emphasize that this example is part (in fact the last one) of the workflow of PrecompileTools. So that example should be precompiled and execute ~immediately

I have spent countless hours trying to figure out how, or at least minimize, these regressions but essentially the recoveries were minuscule. But now the @trace_compile macro is reporting more convincing numbers. And what do we see?

How come that so many Julia functions are brought to compilation when running this example?
Note that the first one is:

#=    6.6 ms =# precompile(Tuple{typeof(Base.rand), Int64, Int64})

So the rand function with two integers in not compiled in the Julia image?

And near the end

#= 6127.9 ms =# precompile(Tuple{typeof(GMT.plot), Array{Float64, 2}}) # recompile

The plot is (re)compiled again? But why? Again, PrecompileTools compiled that exact same function call so it should live in precompiled cache.

I also looked with SnoopCompile and found only 6 invalidations in the entire GMT code, and of those 5 happen in SparseArrays.

Don’t know what to do to solve these regressions and time goes by and situation does not improve (recover).

julia> @trace_compile @eval plot(rand(5,2))
#=    6.6 ms =# precompile(Tuple{typeof(Base.rand), Int64, Int64})
#=    2.9 ms =# precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:limit, :keepempty), Tuple{Int64, Bool}}, typeof(Base.eachsplit), String, Char})
#=    2.5 ms =# precompile(Tuple{Base.var"##eachsplit#425", Int64, Bool, typeof(Base.eachsplit), String, Char})
#=    4.8 ms =# precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:limit, :keepempty), Tuple{Int64, Bool}}, typeof(Base.eachsplit), String, Function})
#=    4.1 ms =# precompile(Tuple{Base.var"##eachsplit#423", Int64, Bool, typeof(Base.eachsplit), String, Function})
#=    2.5 ms =# precompile(Tuple{Type{Base.SplitIterator{S, F} where F where S<:AbstractString}, String, Base.Fix{2, typeof(Base.isequal), Char}, Int64, Bool})
#=    4.6 ms =# precompile(Tuple{typeof(Base.getproperty), Base.SplitIterator{String, Base.Fix{2, typeof(Base.isequal), Char}}, Symbol})
#=    2.9 ms =# precompile(Tuple{typeof(Base.Iterators.product), Base.UnitRange{Int64}, Vararg{Any}})
#=    2.7 ms =# precompile(Tuple{typeof(Base.to_shape), Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}})
#=    4.8 ms =# precompile(Tuple{typeof(Base._array_for), Type{Symbol}, Base.HasShape{1}, Tuple{Base.OneTo{Int64}}})
#=   15.1 ms =# precompile(Tuple{typeof(Base.collect_to_with_first!), Array{Symbol, 1}, Symbol, Base.Generator{Base.UnitRange{Int64}, Base.var"#_ntuple##0#_ntuple##1"{Base.Compiler.var"#tuple_tfunc##0#tuple_tfunc##1"{Array{Any, 1}}}}, Int64})
#=   73.7 ms =# precompile(Tuple{Base.Compiler.var"#inferiterate_2arg#abstract_iteration##1"{Base.Compiler.InferenceLattice{Base.Compiler.ConditionalsLattice{Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}}, Array{Any, 1}, Array{Base.Compiler.CallMeta, 1}, Type{Base.Pairs{Symbol, Any, Nothing, NamedTuple{(:F, :D, :par), Tuple{String, String, Tuple{Symbol, String}}}}}, Base.Compiler.Future{Base.Compiler.AbstractIterationResult}}, Base.Compiler.NativeInterpreter, Base.Compiler.InferenceState})
#=    2.9 ms =# precompile(Tuple{typeof(Base.Broadcast._getindex), Tuple{Int64}, Int64})
#=   77.8 ms =# precompile(Tuple{Base.Compiler.var"#inferiterate_2arg#abstract_iteration##1"{Base.Compiler.InferenceLattice{Base.Compiler.ConditionalsLattice{Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}}, Array{Any, 1}, Array{Base.Compiler.CallMeta, 1}, Type{String}, Base.Compiler.Future{Base.Compiler.AbstractIterationResult}}, Base.Compiler.NativeInterpreter, Base.Compiler.InferenceState})
#=   11.9 ms =# precompile(Tuple{typeof(Base.Broadcast.broadcastable), Int64})
#=    3.0 ms =# precompile(Tuple{typeof(Base.convert), Type{UInt32}, Int32})
#=    6.2 ms =# precompile(Tuple{Type{Int64}, Char})
#=    2.8 ms =# precompile(Tuple{typeof(Base.split_sign), Int64})
#=    3.0 ms =# precompile(Tuple{typeof(Base.indexed_iterate), Tuple{UInt64, Bool}, Int64})
#=    3.1 ms =# precompile(Tuple{typeof(Base.indexed_iterate), Tuple{UInt64, Bool}, Int64, Int64})
#=    1.9 ms =# precompile(Tuple{typeof(Base.abs), UInt64})
#=    2.2 ms =# precompile(Tuple{typeof(Base.unsigned), UInt64})
#=    2.2 ms =# precompile(Tuple{typeof(Base.top_set_bit), UInt64})
#=   74.4 ms =# precompile(Tuple{Base.Compiler.var"#inferiterate_2arg#abstract_iteration##1"{Base.Compiler.InferenceLattice{Base.Compiler.ConditionalsLattice{Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}}, Array{Any, 1}, Array{Base.Compiler.CallMeta, 1}, Type{Base.Pairs{Symbol, V, Nothing, NT} where NT<:(NamedTuple{names, T} where T<:Tuple where names) where V}, Base.Compiler.Future{Base.Compiler.AbstractIterationResult}}, Base.Compiler.NativeInterpreter, Base.Compiler.InferenceState})
#=    3.0 ms =# precompile(Tuple{Type{UInt32}, Int32})
#=    3.0 ms =# precompile(Tuple{typeof(Base.to_indices), Array{String, 1}, Tuple{Base.UnitRange{Int64}}})
#=    2.5 ms =# precompile(Tuple{typeof(Base.map), Base.var"#view##0#view##1"{Array{String, 1}}, Tuple{Base.UnitRange{Int64}}})
#=    2.1 ms =# precompile(Tuple{typeof(Base._maybe_reshape_parent), Array{String, 1}, Tuple{Bool}})
#=    2.0 ms =# precompile(Tuple{typeof(Base.reshape), Array{String, 1}, Base.Val{1}})
#=    3.2 ms =# precompile(Tuple{typeof(Base.unsafe_view), Array{String, 1}, Base.UnitRange{Int64}})
#=    3.3 ms =# precompile(Tuple{Type{Base.SubArray{T, N, P, I, L} where L where I where P where N where T}, Array{String, 1}, Tuple{Base.UnitRange{Int64}}})
#=    2.4 ms =# precompile(Tuple{typeof(Base.ensure_indexable), Tuple{Base.UnitRange{Int64}}})
#=    3.1 ms =# precompile(Tuple{Type{Base.SubArray{T, N, P, I, L} where L where I where P where N where T}, Base.IndexLinear, Array{String, 1}, Tuple{Base.UnitRange{Int64}}, Tuple{Bool}})
#=    2.0 ms =# precompile(Tuple{typeof(Base.compute_offset1), Array{String, 1}, Int64, Tuple{Base.UnitRange{Int64}}})
#=    2.8 ms =# precompile(Tuple{Type{Base.SubArray{String, 1, Array{String, 1}, Tuple{Base.UnitRange{Int64}}, true}}, Array{String, 1}, Tuple{Base.UnitRange{Int64}}, Int64, Int64})
#=    2.9 ms =# precompile(Tuple{typeof(Base.axes), Base.SubArray{String, 1, Array{String, 1}, Tuple{Base.UnitRange{Int64}}, true}})
#=    3.2 ms =# precompile(Tuple{typeof(Base.checkbounds), Base.SubArray{String, 1, Array{String, 1}, Tuple{Base.UnitRange{Int64}}, true}, Int64})
#=    4.4 ms =# precompile(Tuple{typeof(Base.getproperty), Base.SubArray{String, 1, Array{String, 1}, Tuple{Base.UnitRange{Int64}}, true}, Symbol})
#=    2.8 ms =# precompile(Tuple{typeof(Base._reindexlinear), Base.SubArray{String, 1, Array{String, 1}, Tuple{Base.UnitRange{Int64}}, true}, Int64})
#=   75.3 ms =# precompile(Tuple{Base.Compiler.var"#inferiterate_2arg#abstract_iteration##1"{Base.Compiler.InferenceLattice{Base.Compiler.ConditionalsLattice{Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}}, Array{Any, 1}, Array{Base.Compiler.CallMeta, 1}, Type{Base.Pairs{Symbol, String, Nothing, NamedTuple{(:T, :f), Tuple{String, String}}}}, Base.Compiler.Future{Base.Compiler.AbstractIterationResult}}, Base.Compiler.NativeInterpreter, Base.Compiler.InferenceState})
#=    9.6 ms =# precompile(Tuple{typeof(Base.Compiler.EscapeAnalysis.escape_builtin!), typeof(Core.finalizer), Base.Compiler.EscapeAnalysis.AnalysisState{Base.Compiler.GetNativeEscapeCache{Base.Compiler.WorldView{Base.Compiler.InternalCodeCache}}, Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}, Int64, Array{Any, 1}})
#=    2.7 ms =# precompile(Tuple{typeof(Base.:(>)), Int64})
#=    4.7 ms =# precompile(Tuple{typeof(Base.all), Function, Tuple{Int64, Int64}})
#=    3.5 ms =# precompile(Tuple{typeof(Base._all), Base.Fix{2, typeof(Base.:(>)), Int64}, Tuple{Int64, Int64}, Base.Colon})
#=    4.9 ms =# precompile(Tuple{typeof(Base.findfirst), Function, Tuple{Int64, Int64}})
#=    2.2 ms =# precompile(Tuple{Base.Fix{2, typeof(Base.:(>)), Int64}, Int64})
#=    3.2 ms =# precompile(Tuple{typeof(Base.Broadcast.broadcasted), typeof(Base.:(+)), Int64, Base.UnitRange{Int64}})
#=    1.9 ms =# precompile(Tuple{typeof(Base.Broadcast.materialize), Base.UnitRange{Int64}})
#=    3.2 ms =# precompile(Tuple{typeof(Base.:(var"==")), Function, Function})
#=   68.3 ms =# precompile(Tuple{Base.Compiler.var"#inferiterate_2arg#abstract_iteration##1"{Base.Compiler.InferenceLattice{Base.Compiler.ConditionalsLattice{Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}}, Array{Any, 1}, Array{Base.Compiler.CallMeta, 1}, Type{Base.Pairs{Symbol, String, Nothing, NamedTuple{(:J,), Tuple{String}}}}, Base.Compiler.Future{Base.Compiler.AbstractIterationResult}}, Base.Compiler.NativeInterpreter, Base.Compiler.InferenceState})
#=   78.4 ms =# precompile(Tuple{Base.Compiler.var"#inferiterate_2arg#abstract_iteration##1"{Base.Compiler.InferenceLattice{Base.Compiler.ConditionalsLattice{Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}}, Array{Any, 1}, Array{Base.Compiler.CallMeta, 1}, Type{Base.Pairs{Symbol, T, Nothing, A} where A<:(NamedTuple{(:R, :J), var"#s185"} where var"#s185"<:Tuple{Union{Array{Any, 1}, Array{Float64, 1}}, String}) where T}, Base.Compiler.Future{Base.Compiler.AbstractIterationResult}}, Base.Compiler.NativeInterpreter, Base.Compiler.InferenceState})
#=    5.0 ms =# precompile(Tuple{typeof(Base._array_for), Type{Tuple{String, typeof(GMT.arg2str), Int64}}, Base.HasShape{1}, Tuple{Base.OneTo{Int64}}}) # recompile
#=   16.9 ms =# precompile(Tuple{typeof(Base.collect_to_with_first!), Array{Tuple{String, typeof(GMT.arg2str), Int64}, 1}, Tuple{String, typeof(GMT.arg2str), Int64}, Base.Generator{Base.UnitRange{Int64}, Base.var"#_ntuple##0#_ntuple##1"{Base.Compiler.var"#tuple_tfunc##0#tuple_tfunc##1"{Array{Any, 1}}}}, Int64})
#=   36.2 ms =# precompile(Tuple{typeof(Base.setindex_widen_up_to), Array{Tuple{String, typeof(GMT.arg2str), Int64}, 1}, Tuple{String, typeof(GMT.arg2str)}, Int64})
#=   10.6 ms =# precompile(Tuple{typeof(Base.collect_to!), Array{Tuple{String, typeof(GMT.arg2str), Vararg{Int64}}, 1}, Base.Generator{Base.UnitRange{Int64}, Base.var"#_ntuple##0#_ntuple##1"{Base.Compiler.var"#tuple_tfunc##0#tuple_tfunc##1"{Array{Any, 1}}}}, Int64, Int64})
#=   21.9 ms =# precompile(Tuple{typeof(Base.setindex_widen_up_to), Array{Tuple{String, typeof(GMT.arg2str), Vararg{Int64}}, 1}, String, Int64})
#=    3.1 ms =# precompile(Tuple{typeof(Base.structdiff), NamedTuple{(:W, :A, :Vd), Tuple{Float64, String, Int64}}, Type{NamedTuple{(:clip,), T} where T<:Tuple}})
#=    2.8 ms =# precompile(Tuple{typeof(Base._new_NamedTuple), Type{NamedTuple{(:W, :A, :Vd), Tuple{Float64, String, Int64}}}, Tuple{Float64, String, Int64}})
#=    2.6 ms =# precompile(Tuple{typeof(Base.pairs), NamedTuple{(:W, :A, :Vd), Tuple{Float64, String, Int64}}})
#=    2.8 ms =# precompile(Tuple{Type{Base.Pairs{Symbol, V, I, A} where A where I where V}, NamedTuple{(:W, :A, :Vd), Tuple{Float64, String, Int64}}, Nothing})
#=   74.8 ms =# precompile(Tuple{Base.Compiler.var"#inferiterate_2arg#abstract_iteration##1"{Base.Compiler.InferenceLattice{Base.Compiler.ConditionalsLattice{Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}}, Array{Any, 1}, Array{Base.Compiler.CallMeta, 1}, Type{Base.Pairs{Symbol, Any, Nothing, NamedTuple{(:W, :A, :Vd), Tuple{Float64, String, Int64}}}}, Base.Compiler.Future{Base.Compiler.AbstractIterationResult}}, Base.Compiler.NativeInterpreter, Base.Compiler.InferenceState})
#=   74.7 ms =# precompile(Tuple{Base.Compiler.var"#inferiterate_2arg#abstract_iteration##1"{Base.Compiler.InferenceLattice{Base.Compiler.ConditionalsLattice{Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}}, Array{Any, 1}, Array{Base.Compiler.CallMeta, 1}, Type{Base.Pairs{Symbol, String, Nothing, NamedTuple{(:f, :s), Tuple{String, String}}}}, Base.Compiler.Future{Base.Compiler.AbstractIterationResult}}, Base.Compiler.NativeInterpreter, Base.Compiler.InferenceState})
#=   73.1 ms =# precompile(Tuple{Base.Compiler.var"#inferiterate_2arg#abstract_iteration##1"{Base.Compiler.InferenceLattice{Base.Compiler.ConditionalsLattice{Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}}, Array{Any, 1}, Array{Base.Compiler.CallMeta, 1}, Type{Base.Pairs{Symbol, Any, Nothing, NamedTuple{(:pos, :B, :Vd), Tuple{NamedTuple{(:anchor,), Tuple{String}}, String, Int64}}}}, Base.Compiler.Future{Base.Compiler.AbstractIterationResult}}, Base.Compiler.NativeInterpreter, Base.Compiler.InferenceState})
#=   74.4 ms =# precompile(Tuple{Base.Compiler.var"#inferiterate_2arg#abstract_iteration##1"{Base.Compiler.InferenceLattice{Base.Compiler.ConditionalsLattice{Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}}, Array{Any, 1}, Array{Base.Compiler.CallMeta, 1}, Type{Base.Pairs{Symbol, Int64, Nothing, NamedTuple{(:Vd,), Tuple{Int64}}}}, Base.Compiler.Future{Base.Compiler.AbstractIterationResult}}, Base.Compiler.NativeInterpreter, Base.Compiler.InferenceState})
#=   71.6 ms =# precompile(Tuple{Base.Compiler.var"#inferiterate_2arg#abstract_iteration##1"{Base.Compiler.InferenceLattice{Base.Compiler.ConditionalsLattice{Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}}, Array{Any, 1}, Array{Base.Compiler.CallMeta, 1}, Type{Base.Pairs{Symbol, String, Nothing, NamedTuple{(:R,), Tuple{String}}}}, Base.Compiler.Future{Base.Compiler.AbstractIterationResult}}, Base.Compiler.NativeInterpreter, Base.Compiler.InferenceState})
#=    2.7 ms =# precompile(Tuple{typeof(Base.:(+)), Vararg{Int64, 4}})
#=   75.1 ms =# precompile(Tuple{Base.Compiler.var"#inferiterate_2arg#abstract_iteration##1"{Base.Compiler.InferenceLattice{Base.Compiler.ConditionalsLattice{Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}}, Array{Any, 1}, Array{Base.Compiler.CallMeta, 1}, Type{Base.Pairs{Symbol, Bool, Nothing, NamedTuple{(:gdataset,), Tuple{Bool}}}}, Base.Compiler.Future{Base.Compiler.AbstractIterationResult}}, Base.Compiler.NativeInterpreter, Base.Compiler.InferenceState})
#=   74.4 ms =# precompile(Tuple{Base.Compiler.var"#inferiterate_2arg#abstract_iteration##1"{Base.Compiler.InferenceLattice{Base.Compiler.ConditionalsLattice{Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}}, Array{Any, 1}, Array{Base.Compiler.CallMeta, 1}, Type{Base.Pairs{Symbol, NTuple{4, Float64}, Nothing, NamedTuple{(:R,), Tuple{NTuple{4, Float64}}}}}, Base.Compiler.Future{Base.Compiler.AbstractIterationResult}}, Base.Compiler.NativeInterpreter, Base.Compiler.InferenceState})
#=   75.9 ms =# precompile(Tuple{Base.Compiler.var"#inferiterate_2arg#abstract_iteration##1"{Base.Compiler.InferenceLattice{Base.Compiler.ConditionalsLattice{Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}}, Array{Any, 1}, Array{Base.Compiler.CallMeta, 1}, Type{Base.Pairs{Symbol, Any, Nothing, NamedTuple{(:J, :C, :F), Tuple{String, Bool, Bool}}}}, Base.Compiler.Future{Base.Compiler.AbstractIterationResult}}, Base.Compiler.NativeInterpreter, Base.Compiler.InferenceState})
#=   86.4 ms =# precompile(Tuple{Base.Compiler.var"#inferiterate_2arg#abstract_iteration##1"{Base.Compiler.InferenceLattice{Base.Compiler.ConditionalsLattice{Base.Compiler.PartialsLattice{Base.Compiler.ConstsLattice}}}, Array{Any, 1}, Array{Base.Compiler.CallMeta, 1}, Type{Base.Pairs{Symbol, Any, Nothing, NamedTuple{(:Q, :o), Tuple{Bool, String}}}}, Base.Compiler.Future{Base.Compiler.AbstractIterationResult}}, Base.Compiler.NativeInterpreter, Base.Compiler.InferenceState})
#= 6127.9 ms =# precompile(Tuple{typeof(GMT.plot), Array{Float64, 2}}) # recompile
#=    8.9 ms =# precompile(Tuple{typeof(Base.mapreducedim!), Function, Function, Array{Tuple{Float64, Float64}, 2}, Array{Float64, 2}})
#=   24.6 ms =# precompile(Tuple{typeof(Base.convert), Type{Array{Union{Int64, String}, 1}}, Array{Int64, 1}})
#=   51.1 ms =# precompile(Tuple{typeof(GMT.add_opt_cpt), Base.Dict{Symbol, Any}, String, Array{Symbol, 2}, Char, Int64, GMT.GMTdataset{Float64, 2}, Nothing, Bool, Bool, String, Bool}) # recompile
#=  522.6 ms =# precompile(Tuple{typeof(GMT.prep_and_call_finish_PS_module), Base.Dict{Symbol, Any}, Array{String, 1}, String, Bool, Bool, Bool, GMT.GMTdataset{Float64, 2}, Nothing, Nothing, Nothing}) # recompile
#=    8.2 ms =# precompile(Tuple{typeof(Base.map), Type{UInt8}, NTuple{32, Char}})
2 Likes

Doesn‘t PrecompileTools just generate these precompile statements and execute them? Do you see the same time if you manually put the precompile statements?

As of recently I’m again working on improving the resistance of the sysimage to invalidation. Which amounts to fixing guilty type instabilities in Base, JuliaSyntax.jl, or in the stdlibs that are included in the sysimage, such as Random.jl.

However I really hope we can get some testing against sysimage invalidation into Julia’s test suite, otherwise my efforts may be futile.

Issue:

NB: StyledStrings.jl still does some type piracy which is not guarded against with inference barriers, and both REPL.jl and Pkg.jl load StyledStrings.jl. So that’s another source of invalidation, in interactive use or when Pkg.jl is loaded.

Issue comment:

3 Likes

Yes, I’ve tried all of that. Manual precomp instructions, overlapping or nor the PrecompileToolls version. Partial precompile calls like GMT.jl/src/GMT.jl at master · GenericMappingTools/GMT.jl · GitHub, etc. Nill

1 Like

Is this a wrong link, I don’t see a @compile_workload block anywhere.

Yes.
It has a @setup_workload, which I think does the same thing. At least the result is the same.

The PrecompileTools.jl documentation disagrees.

1 Like

To save a few clicks, @setup_workload specifies the preparatory calls that aren’t necessarily put into the precompile cache, and @compile_workload specifies all its calls go into the precompile cache. If you intend calls to go into the precompile cache, you should make that change. That said, precompile calls or direct calls at top-level should also contribute to the precompile cache, albeit more limited by inferrability and module ownership than @compiled_workload. Manual precompiles failing is a problem that could affect @compile_workload.

1 Like

Some of it may be a straight up limitation (bug?) in Julia: `pkgimages` cannot include / pre-compile certain code (☔ umbrella issue) · Issue #58759 · JuliaLang/julia · GitHub

I was not clear in my answer but I also tried with @compile_workload and got the same results. And not forget also that we are talking about a striking difference between version 1.10 and later.