Invalidations findings (from a GMT case)

That’s not unlikely.

The numbers for GMT are good … as long as I don’t load another package.
I found this very weird behavior. If I load another package the TTSP (Time To Second Plot) raises again, and the raise depends on which package is loaded.

  | | |_| | | | (_| |  |  Version 1.9.0-beta2 (2022-12-29)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> @time using GMT
  2.328202 seconds (4.07 M allocations: 244.371 MiB, 6.25% gc time)

julia> @time @eval GMT.plot(rand(5,2))
  0.687905 seconds (1.07 M allocations: 70.316 MiB, 2.26% gc time, 146.25% compilation time: 35% of which was recompilation)

julia> using FFTW

julia> @time @eval GMT.plot(rand(5,2))
  5.806493 seconds (15.11 M allocations: 976.311 MiB, 6.97% gc time, 165.59% compilation time: 60% of which was recompilation)

but the increase can be larger

  | | |_| | | | (_| |  |  Version 1.9.0-beta2 (2022-12-29)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> @time using GMT
  2.327048 seconds (4.07 M allocations: 244.368 MiB, 6.37% gc time)

julia> @time @eval GMT.plot(rand(5,2))
  0.694118 seconds (1.07 M allocations: 70.315 MiB, 2.23% gc time, 145.97% compilation time: 34% of which was recompilation)

julia> using RDatasets

julia> @time @eval plot(rand(5,2))
  8.269462 seconds (21.68 M allocations: 1.372 GiB, 7.55% gc time, 167.85% compilation time: 60% of which was recompilation)

Regarding the TTFP on Windows, I opened an issue: Possible failure of restoring compiled code from pkg image on Windows. · Issue #48096 · JuliaLang/julia · GitHub.

2 Likes

Well, I would not say that the behaviour is weird, I would say it is to be expected… Loading any package can invalidate methods. To some degree this cannot be avoided, but often it can, but that needs special investigation for each package/ each combination of packages.

@tim.holy What is the suggested approach to mitigate this issue?

I checked the invalidations with SnoopCompile and the report was still zero invalidations.

In the case where you’re investigating package interactions, just put both packages inside the @snoopr block. The @snoopi_deep (the workload you want to be fast) does not have to change at all.

Example: since CSV and DataFrames get used together a lot, you can do

using SnoopCompileCore
invs = @snoopr using CSV, DataFrames;
tinf = @snoopi_deep CSV.File("somefile.csv");
using SnoopCompile
trees = invalidation_trees(invs);
staletrees = precompile_blockers(trees, tinf)

There’s nothing about this workload that uses DataFrames, but merely loading it can invalidate some compiled code in CSV (Invalidations when loading DataFrames · Issue #1061 · JuliaData/CSV.jl · GitHub).

Explanation:

  • invalidation happens when you define methods. That includes loading packages. So put everything that might define methods inside the @snoopr block
  • inference and compilation happen when you run code that isn’t fully compiled. That includes cases where it was compiled but has since been invalidated. @snoopi_deep spies on inference, which is the first step of compilation, so you can record what got compiled.
  • precompile_blockers combines these two streams of information: it looks for things that were invalidated (they are in trees) but then later compiled (in tinf). This is what lets you focus on just the invalidations you need to fix: because the inference occurred inside the @snoopi_deep block, you know you needed that compiled code in order to run the workload.

Hopefully with that conceptual understanding, it will be clearer how to adjust your workflow to whatever situation arises. But I’m happy to continue answering questions, keep 'em coming!

1 Like

Could someone on Windows try enable more precompilation on Windows by KristofferC · Pull Request #4617 · JuliaPlots/Plots.jl · GitHub for TTFP with Plots. Just so we can confirm that is nipped in the bud.

1 Like

The strategies for fixing cross-package invalidations are similar to any other situation. Your options are:

  • improve the inferrability of your code. If inference succeeds, then there isn’t uncertainty (barring type piracy) about which method should be called. You’re generally vulnerable to invalidation only if your code doesn’t fully infer.
  • take on the package that triggers the invalidations as a dependency. If your package “knows about” that package when it’s precompiling, it can’t later invalidate.
  • protect certain calls with a Base.invokelatest. This forces runtime dispatch, and it can break long chains of calls to limit the amount of code that gets invalidated. For example, if MyPkg.f calls MyPkg.g, and g is poorly-inferred (and kinda hard to fix), then instead of MyPkg.f(args) = MyPkg.g(args) you could do MyPkg.f(args) = Base.invokelatest(MyPkg.f, args). That would at least prevent any invalidations that happen in g from cascading up to invalidate f.

That gives me:

julia> @time @eval using Plots; x = 0:0.1:10; y = sin.(x); @time @eval display(Plots.plot(x, y));
  3.174683 seconds (6.24 M allocations: 388.300 MiB, 5.23% gc time, 5.03% compilation time)
  0.327393 seconds (240.27 k allocations: 15.547 MiB, 103.61% compilation time: 4% of which was recompilation)

On the same machine as above, so seems to solve the problem. I’m suprised that this ended up being a package- (rather than language-) level fix, is this why in the PR you say “workaround”, i.e. do you think there is a wider issue in how the new caching mechanism works on Windows compared to other platforms?

1 Like

It’s kind of a workaround because the issue `julia 1.8.2` breaks `Plots` precompilation (vs `1.8.1`) on `windows` · Issue #46989 · JuliaLang/julia · GitHub is still valid and this doesn’t fix whatever caused the issue there. It just makes it not error anymore so that the precompilation can continue.

So I don’t think there is any issue in how caching works on Windows, it’s just for that particular case there is something strange with how Windows, Julia and GR interacts.

1 Like

Thanks for this one more clarification. The problem is that I think I understand the concepts of what is happening but than the practice keeps pulling my hairs. Take for example the interference with FFTW. I did

invs = @snoopr using GMT, FFTW
tinf = @snoopi_deep plot(rand(5,2));

and got report for several invalidations. First few

julia> include("c:/v/test_gmt1.jl")
3-element Vector{SnoopCompile.StaleTree}:
 inserting eltype(::Type{ChainRulesCore.ZeroTangent}) @ ChainRulesCore C:\Users\joaqu\.julia\packages\ChainRulesCore\C73ay\src\tangent_types\abstract_zero.jl:55 invalidated:
   backedges: 1: MethodInstance for eltype(::Type) at depth 0 with 20 children blocked 0.2886276 inclusive time for 2 nodes

 inserting *(::Any, ::ChainRulesCore.ZeroTangent) @ ChainRulesCore C:\Users\joaqu\.julia\packages\ChainRulesCore\C73ay\src\tangent_arithmetic.jl:105 invalidated:
   mt_backedges: 1: MethodInstance for Base.afoldl(::typeof(*), ::String, ::Any) at depth 0 with 3 children blocked InferenceTimingNode: 0.006344/0.012946 on GMT.add_opt_cpt(::Dict{Symbol, Any}, ::String, ::Matrix{Symbol}, ::Char, ::Int64, ::Matrix{Float64}, nothing::Nothing, ::Bool, ::Bool, ::String, ::Bool) with 14 direct children
                 2: MethodInstance for GMT.finish_PS_module(::Dict{Symbol, Any}, ::Vector{String}, ::String, ::Bool, ::Bool, ::Bool, ::Matrix{Float64}, ::Vararg{Any}) at depth 0 with 0 children blocked InferenceTimingNode: 0.008153/0.015754 on GMT.finish_PS_module(::Dict{Symbol, Any}, ::Vector{String}, ::String, ::Bool, ::Bool, ::Bool, ::Matrix{Float64}, ::Vararg{Any}) with 18 direct children
                 3: MethodInstance for GMT.fname_out(::Dict{Symbol, Any}, ::Bool) at depth 0 with 1 children blocked InferenceTimingNode: 0.008153/0.015754 on GMT.finish_PS_module(::Dict{Symbol, Any}, ::Vector{String}, ::String, ::Bool, ::Bool, ::Bool, ::Matrix{Float64}, ::Vararg{Any}) with 18 direct children
                 4: MethodInstance for GMT.add_opt(::NamedTuple, ::NamedTuple{(:symb, :size, :unit), Tuple{String, String, String}}, ::Nothing) at depth 0 with 3 children blocked InferenceTimingNode: 0.023198/0.068377 on GMT.parse_opt_S(::Dict{Symbol, Any}, ::Matrix{Float64}, ::Bool) with 30 direct children
                 5: MethodInstance for GMT.add_opt(::NamedTuple, ::NamedTuple{(:name, :size, :unit), Tuple{String, String, String}}, ::Nothing) at depth 0 with 3 children blocked InferenceTimingNode: 0.023198/0.068377 on GMT.parse_opt_S(::Dict{Symbol, Any}, ::Matrix{Float64}, ::Bool) with 30 direct children
                 6: MethodInstance for GMT.helper_arrows(::Dict, ::Bool) at depth 0 with 6 children blocked InferenceTimingNode: 0.006474/0.205018 on GMT.helper_multi_cols(::Dict{Symbol, Any}, ::Matrix{Float64}, ::Bool, ::String, ::String, ::String, ::String, ::Bool, ::Vector{Bool}, ::Vector{String}, ::String, ::Vector{String}, ::Bool, ::Bool) with 24 direct children

where all but first are internal function to the GMT package called only by it. How come that they can be invalidated by another package that know nothing about them? Take for example the last one above, the helper_arrows. It’s a simple internal function. Why is it invalidated?

Take for example the last one above, the helper_arrows. It’s a simple internal function. Why is it invalidated?

Great question! If you look at the source of helper_arrows, you can see it calls *. For one or more of the calls x * y (the same as *(x, y)), the type of x and/or y must not be inferrable. Thus when you precompile GMT, helper_arrows gets compiled with a potential list of * methods that might apply, but loading ChainRulesCore added a new * method that also might apply. So it threw away all the old compiled code so that it could be recompiled to take this new possibility into account.

You can learn more like this:

julia> using GMT     # thanks for making GMT easy to build!!!!

julia> code_warntype(GMT.helper_arrows, (Dict, Bool))

and look for red. Unfortunately it looks like the file & line numbers don’t work for code_warntype, but you can use code_typed(GMT.helper_arrows, (Dict, Bool); debuginfo=:source, optimize=false) and track down those statements that correspond to the lines with red.

In your case, it looks like your call to string(::Real) is inferred as ::Any; if you put ::String after that call, you might fix it. I.e., "$val"::String for your specific code.

FWIW, this type of analysis is better done with ascend, and highly recommended if you have a lot of invalidations to fix. But there’s a bit of a learning curve, so using code_typed is a good alternative.

5 Likes

This is good as I had thought in all that you mentioned, except one. What code_warntype shows I knew it already because that’s something I cannot avoid. Those are parsing functions that at the end produce a string with commands to pass to the GMT C lib. But to take advantage of the Julia versatility (read multi-dispatch) the solution I found was to pass the kwargs in a Dict{Symbol, Any} and every time I fish in that Dict I get aAny, so my next care is to avoid as much as I can to propagate those Any's (which are diabolically contaminating beasts). And the helper_arrows function actually does a good job on that except in the line (the functions in the following 2 lines are signed to return Strings).

cmd = code * "$val"

that I though was type stable too (damn it, I’m telling val it’s a String) but it wasn’t (as you pointed out). Adding the ::String annotation solved this case but it looks I have lots of others like this to chase.

Thanks again and hope this is helpful for others in this type of hunt.

2 Likes

I’m afraid I’ll have to accept the offer. In result of the step trees = invalidation_trees(invs) I’m left with a bunch of these:

 inserting *(::Any, ::ChainRulesCore.ZeroTangent) @ ChainRulesCore C:\Users\joaqu\.julia\packages\ChainRulesCore\C73ay\src\tangent_arithmetic.jl:105 invalidated:
   mt_backedges:  1: signature Tuple{typeof(*), String, Any} triggered MethodInstance for *(::String, ::Any, ::String, ::Any) (0 children)
                  2: signature Tuple{typeof(*), String, Any} triggered MethodInstance for *(::String, ::String, ::Any, ::String, ::Any) (0 children)
                  3: signature Tuple{typeof(*), String, Any} triggered MethodInstance for *(::String, ::Any, ::String, ::Any, ::String) (1 children)
                  4: signature Tuple{typeof(*), Regex, Any} triggered MethodInstance for Base.afoldl(::typeof(*), ::Regex, ::Any, ::String) (1 children)
                  5: signature Tuple{typeof(*), Missing, Any} triggered MethodInstance for Base.afoldl(::typeof(*), ::Missing, ::String, ::Any) (1 children)
                  6: signature Tuple{typeof(*), Missing, Any} triggered MethodInstance for Base.afoldl(::typeof(*), ::Missing, ::Any, ::String) (1 children)
                  7: signature Tuple{typeof(*), Regex, Any} triggered MethodInstance for Base.afoldl(::typeof(*), ::Regex, ::String, ::Any) (1 children)
                  8: signature Tuple{typeof(*), String, Any} triggered MethodInstance for Base.afoldl(::typeof(*), ::String, ::Any, ::String) (1 children)
                  9: signature Tuple{typeof(*), Missing, Any} triggered MethodInstance for Base.afoldl(::typeof(*), ::Missing, ::Any) (3 children)
                 10: signature Tuple{typeof(*), String, Any} triggered MethodInstance for Base.afoldl(::typeof(*), ::String, ::String, ::Char, ::String, ::String, ::Any) (4 children)

How do I find those anonymous

signature Tuple{typeof(*), String, Any} triggered MethodInstance for *(::String, ::Any, ::String, ::Any)

and what is Base.afoldl(...)?

I don’t have an exact reproducer of what you’re doing, but in this case you want ascend. Snooping on and fixing invalidations: @snoopr · SnoopCompile, which includes a link to a youtube video you may find helpful.

This is an example:

julia> tree = staletrees[1]
inserting *(::Any, ::ChainRulesCore.ZeroTangent) @ ChainRulesCore ~/.julia/packages/ChainRulesCore/C73ay/src/tangent_arithmetic.jl:105 invalidated:
   mt_backedges: 1: MethodInstance for Base.afoldl(::typeof(*), ::String, ::Any) at depth 0 with 3 children blocked InferenceTimingNode: 0.003445/0.006942 on GMT.add_opt_cpt(::Dict{Symbol, Any}, ::String, ::Matrix{Symbol}, ::Char, ::Int64, ::Matrix{Float64}, nothing::Nothing, ::Bool, ::Bool, ::String, ::Bool) with 14 direct children
                 2: MethodInstance for GMT.finish_PS_module(::Dict{Symbol, Any}, ::Vector{String}, ::String, ::Bool, ::Bool, ::Bool, ::Matrix{Float64}, ::Vararg{Any}) at depth 0 with 0 children blocked InferenceTimingNode: 0.005608/0.010562 on GMT.finish_PS_module(::Dict{Symbol, Any}, ::Vector{String}, ::String, ::Bool, ::Bool, ::Bool, ::Matrix{Float64}, ::Vararg{Any}) with 18 direct children
                 3: MethodInstance for GMT.fname_out(::Dict{Symbol, Any}, ::Bool) at depth 0 with 1 children blocked InferenceTimingNode: 0.005608/0.010562 on GMT.finish_PS_module(::Dict{Symbol, Any}, ::Vector{String}, ::String, ::Bool, ::Bool, ::Bool, ::Matrix{Float64}, ::Vararg{Any}) with 18 direct children


julia> sig, roots = tree.mt_backedges[1];

julia> ascend(roots)
Choose a call for analysis (q to quit):
 >   afoldl(::typeof(*), ::String, ::Any)
       *(::String, ::String, ::String, ::Any)
         (::GMT.var"#equalize#115")(::Dict{Symbol, Any}, ::Matrix{Float64}, ::Any, ::String)
           add_opt_cpt(::Dict{Symbol, Any}, ::String, ::Matrix{Symbol}, ::Char, ::Int64, ::Matrix{Float64}, ::Nothing, :

The problem is that one of your arguments to equalize is inferred as ::Any.

That is interesting (and troubleshooting). So the inference is that the 3rd argument of the equalize (nested)function is a ::Any

(::GMT.var"#equalize#115")(::Dict{Symbol, Any}, ::Matrix{Float64}, ::Any, ::String)

but the function’s type signatures explicitly sets that it should be a ::String

function equalize(d, arg1, cptname::String, opt_T::String)::GMTcpt

in one of the calls to this function the third arg is indeed a Any but I was convinced that the function type annotations would always take care of the conversion (or error), but this example shows that this is not true. A natural followup question is: what is it that type annotations in functions arguments are good for? Just to give errors if another concrete type is passed in?

It’s not annotated that way for me:

        function equalize(d, arg1, cptname, opt_T)::GMTcpt
                if ((isa(arg1, GMTgrid) || isa(arg1, String)) && (val_eq = find_in_dict(d, [:equalize])[1]) !== n>                        n::Int = convert(Int, val_eq)                           # If val is other than Bool or nu>                        if (isa(arg1, String))
                                (n > 1) ? gmt("grd2cpt -E$n+c -C" * cptname * " " * arg1) : gmt("grd2cpt -C" * cp>                        else
                                (n > 1) ? gmt("grd2cpt -E$n+c -C" * cptname, arg1) : gmt("grd2cpt -C" * cptname, >                        end
                else
                        gmt("makecpt " * opt_T * " -C" * cptname)
                end
        end

I’m looking at GMT v0.44.2. You’ll want to do all this analysis yourself on your development branch. Once you get the hang of it, this level is pretty easy; it’s only when you need to look at the type-inferred code that it gets complicated.

Ah, I’m working on #master where I fixed > hundred invalidations, but I tried with a patch to ensure 3rd arg is always a ::String at the time of function call and it made a difference with or without that patch.
Will dive in Chtulhu but that takes a lot more time then I have right now.

You may not need to use Cthulhu, just call ascend as I showed above. You’ll get a lot of insight just that way.

2 Likes

Cool, easy to use . I thought ascend was Cthulhu stuff and I had once tried it but it didn’t want to work (enter did not enter into anything).

Note to other hunters: this is easy stuff till the prey sight. Fixing is another matter.

1 Like