Package load time regressions in v1.8-beta3

I’ve also experienced quite a bit of compilation slowdown with v1.8. A package I was working on improving compile times for went from 10 minutes to run the complete test suite to over 12 minutes, in both cases precompilation was done before timing. It’s possible that more can be done to cache precompiled code in julia v1.8 though, remains to be explored :slight_smile:

1 Like

It would help to distinguish a few timing benchmark types:

  • pre-compilation time (what happens usually only when you run an update of a package)
  • loading time, e.g. @time @eval using MyLibrary
  • time to first X, a.k.a. TTFX, e.g. @time @eval (using MyLibrary; MyLibrary.typical_task())

Loading time might indeed be a bit higher in 1.8, because more is cached and loaded. But TTFX, which includes loading time, is frequently lower. Here is what I see:

1.7.1
julia> @time @eval using QuantumClifford
  3.330884 seconds (5.18 M allocations: 348.800 MiB, 1.93% gc time, 16.20% compilation time)

julia> @time @eval QuantumClifford._my_precompile_()
 13.032064 seconds (29.81 M allocations: 1.427 GiB, 6.91% gc time, 99.90% compilation time)


1.8.0-beta3
julia> @time @eval using QuantumClifford
  3.704282 seconds (5.52 M allocations: 385.180 MiB, 1.96% gc time, 13.82% compilation time)

julia> @time @eval QuantumClifford._my_precompile_()
 11.074718 seconds (22.36 M allocations: 1.027 GiB, 6.75% gc time, 99.86% compilation time)


1.9.0-DEV.245 (2022-03-24)
julia> @time @eval using QuantumClifford
  3.897533 seconds (5.16 M allocations: 376.538 MiB, 15.43% compilation time)

julia> @time @eval QuantumClifford._my_precompile_()
  9.360486 seconds (5.99 M allocations: 369.788 MiB, 3.05% gc time, 99.83% compilation time)

This is using a non-public version of QuantumClifford. For GMT I do see a slowdown:

1.7.1
julia> @time @eval (using GMT)
  1.285542 seconds (1.27 M allocations: 87.708 MiB, 3.00% gc time, 31.84% compilation time)

julia> @time @eval plot(rand(2,2))
  9.245465 seconds (6.37 M allocations: 330.418 MiB, 0.96% gc time, 99.96% compilation time)


1.8.0-beta3
julia> @time @eval (using GMT)
  1.253315 seconds (911.61 k allocations: 74.206 MiB, 1.50% gc time, 31.57% compilation time)

julia> @time @eval plot(rand(2,2))
 12.065121 seconds (13.40 M allocations: 676.069 MiB, 3.04% gc time, 99.97% compilation time)


1.9.0-DEV.245 (2022-03-24)
julia> @time @eval (using GMT)
  1.252750 seconds (911.31 k allocations: 74.429 MiB, 1.51% gc time, 32.20% compilation time)

julia> @time @eval plot(rand(2,2))
 13.164307 seconds (9.40 M allocations: 605.126 MiB, 1.87% gc time, 99.97% compilation time)

_my_precompile_() simulates typical simple tasks in the library (all of them are very fast, if they are already loaded/compiled). It is a convenient way to test TTFX and if you run it during precompilation it provides for trivial way to cache important compiled methods (hence the name I have given to it).

4 Likes

I would actually expect that PR to make package loading slower, since it saves more things in precompile files and therefore has to load more. But it should cut down compilation time, giving a net improvement. However in this case there could be further compilation slowdowns preventing us from seeing a net benefit, which we should look into.

4 Likes

This one seems to be simply due to Plots.jl not having precompiles for 1.8: https://github.com/JuliaPlots/Plots.jl/pull/4179

1 Like

Nope, the plot from my post is the GMT.jl plot, not the one from Plots.jl

Oops, my mistake, I should have noticed that. Time to get profiling…

2 Likes

I would actually expect that PR to make package loading slower, since it saves more things in precompile files and therefore has to load more.

That is true, but profiling shows that the slowest part of package loading is not the I/O but method and backedge insertion. Backedges can be pre-calculated for internal callgraph edges but have to be computed de novo when linking to external MethodInstances. So in fact there are cases where saving more code actually results in faster loading because more things end up being “internal.” But this absolutely should be a case of YMMV across packages etc.

2 Likes

KiteModels is using InlineStrings which is invalidating Base.require. Which means that loading InlineStrings makes it necessary to recompile the package-loading machinery before the next dependency of KiteModels can be loaded. Ugh.

This has been fixed, presumably, on nightly (https://github.com/JuliaLang/julia/pull/44500).

Update: I can confirm that a 3wk-old master branch restores the load time performance to 1.7 levels. I can’t test loop because I don’t know how you execute it.

4 Likes

Likewise, GMT.plot is getting invalidated (so the precompilation isn’t working):

julia> print_tree(root, 20)
MethodInstance for SnoopCompile.dummy() at depth 0 with 17 children
└─ MethodInstance for copyto!(::BitMatrix, ::Broadcasted{Nothing, Tuple{OneTo{Int64}, OneTo{Int64}}, typeof(==), Tuple{RefValue{Symbol}, Matrix{Symbol}}}) at depth 1 with 16 children
   └─ MethodInstance for copyto!(::BitMatrix, ::Broadcasted{DefaultArrayStyle{2}, Tuple{OneTo{Int64}, OneTo{Int64}}, typeof(==), Tuple{RefValue{Symbol}, Matrix{Symbol}}}) at depth 2 with 15 children
      └─ MethodInstance for copy(::Broadcasted{DefaultArrayStyle{2}, Tuple{OneTo{Int64}, OneTo{Int64}}, typeof(==), Tuple{RefValue{Symbol}, Matrix{Symbol}}}) at depth 3 with 14 children
         └─ MethodInstance for Base.Broadcast.materialize(::Broadcasted{DefaultArrayStyle{2}, Nothing, typeof(==), Tuple{RefValue{Symbol}, Matrix{Symbol}}}) at depth 4 with 13 children
            └─ MethodInstance for GMT.build_opt_R(::Any, ::Symbol) at depth 5 with 12 children
               └─ MethodInstance for GMT.parse_R(::Dict{Symbol, Any}, ::String, ::Bool, ::Bool, ::Bool) at depth 6 with 11 children
                  └─ MethodInstance for GMT.parse_R(::Dict{Symbol, Any}, ::String, ::Bool) at depth 7 with 10 children
                     └─ MethodInstance for GMT.var"#logo#150"(::Bool, ::Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:Vd,), Tuple{Int64}}}, ::typeof(logo), ::String) at depth 8 with 9 children
                        └─ MethodInstance for (::var"#logo##kw")(::NamedTuple{(:first, :Vd), Tuple{Bool, Int64}}, ::typeof(logo), ::String) at depth 9 with 8 children
                           └─ MethodInstance for GMT.var"#logo!#151"(::Bool, ::Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:Vd,), Tuple{Int64}}}, ::typeof(logo!), ::String) at depth 10 with 7 children
                              └─ MethodInstance for (::var"#logo!##kw")(::NamedTuple{(:Vd,), Tuple{Int64}}, ::typeof(logo!), ::String) at depth 11 with 6 children
                                 └─ MethodInstance for (::var"#logo!##kw")(::NamedTuple{(:Vd,), Tuple{Int64}}, ::typeof(logo!)) at depth 12 with 5 children
                                    └─ MethodInstance for GMT.add_opt_module(::Dict{Symbol, Any}) at depth 13 with 4 children
                                       └─ MethodInstance for GMT.finish_PS_nested(::Dict{Symbol, Any}, ::Vector{String}) at depth 14 with 3 children
                                          └─ MethodInstance for GMT.common_plot_xyz(::String, ::Matrix{Float64}, ::String, ::Bool, ::Bool) at depth 15 with 2 children
                                             └─ MethodInstance for GMT.var"#plot#257"(::Bool, ::Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, ::typeof(plot), ::Matrix{Float64}) at depth 16 with 1 children
                                                └─ MethodInstance for GMT.plot(::Matrix{Float64}) at depth 17 with 0 children

This is a weird one because it looks like it’s a consequence of Julia’s own bootstrap (or maybe a bug in SnoopCompile?):

julia> m = methods(GMT.plot).ms[9]
plot(arg1; first, kw...) in GMT at /home/tim/.julia/dev/GMT/src/plot.jl:112

julia> tree = findcaller(m, trees)
inserting copyto!(dest::BitArray, bc::Base.Broadcast.Broadcasted{Nothing}) in Base.Broadcast at broadcast.jl:968 invalidated:
   mt_backedges: 1: signature copyto!(dest::BitArray, bc::Base.Broadcast.Broadcasted{Nothing}) in Base.Broadcast at broadcast.jl:968 (formerly copyto!(dest::AbstractArray, bc::Base.Broadcast.Broadcasted{Nothing}) in Base.Broadcast at broadcast.jl:948) triggered MethodInstance for SnoopCompile.dummy() (17 children)

I.e., Julia is invalidating one of its own methods, due to order of usage and definition. That’s not GMT’s “fault,” of course (invalidations are only rarely the fault of the package affected by them). You can work around it, though, just get rid of broadcasting in build_opt_R(Val, symb::Symbol=Symbol())::String, for example by changing

any(symb .== [:region_llur :limits_llur :limits_diag :region_diag])

to something like

symb ∈ (:region_llur, :limits_llur, :limits_diag, :region_diag)

which is better code anyway.

2 Likes

Thanks, I made the change (actually two of them) you suggested and it cut down ~1 sec but it’s still ~2 sec slower than 1.7

julia> tic(); plot(rand(5,2)); toc()
elapsed time: 8.6452718 seconds
8.6452718

Maybe there are more of them.

There are, but instead of encouraging you to stop using broadcasting I dug into the problem more deeply. There’s a fix up at https://github.com/JuliaLang/julia/pull/45001.

But there is still something you might consider fixing: https://github.com/GenericMappingTools/GMT.jl/blob/df9c6f872ead5655165ca3dadd80ae1116177a02/src/gmt_main.jl#L1503 is piracy (it’s not very nice to change Julia’s printing for all users who might want to combine GMT and something else where they expect printing of Unsigned to remain unchanged). It invalidates ~120 compiled MethodInstances. It does not affect plot, however.

You also have a few more invalidations which appear to be internal (you’re invalidating your own code) but I’m not making sense of it right now. In case it makes sense to you, here’s the output:

julia> trees = invalidation_trees(invs)
┌ Warning: Could not attribute the following delayed invalidations:
└ @ SnoopCompile ~/.julia/dev/SnoopCompile/src/invalidations.jl:463
Any[Tuple{Union{typeof(GMT.add_opt_pen), Char}, Dict{Symbol, Any}, Vector{Symbol}}, Tuple{Union{typeof(GMT.font), Char}, Dict{Symbol, Any}, Vector{Symbol}}] invalidated Core.MethodInstance[MethodInstance for GMT.add_opt(::Dict{Symbol, Any}, ::String, ::String, ::Vector{Symbol}, ::NamedTuple{(:x, :y, :xy, :X, :Y, :asym, :colored, :cline, :csymbol, :wiskers, :cap, :pen), Tuple{String, String, String, String, String, String, String, String, String, String, String, Tuple{String, typeof(GMT.add_opt_pen)}}}, ::Bool, ::Any), MethodInstance for GMT.add_opt(::Dict{Symbol, Any}, ::String, ::String, ::Matrix{Symbol}, ::NamedTuple{(:angle, :Angle, :font, :justify, :region_justify, :header, :label, :rec_number, :text, :zvalues), Tuple{String, String, Tuple{String, typeof(GMT.font)}, Vararg{String, 7}}}, ::Bool, ::Bool)]
SnoopCompile.MethodInvalidations[]

(It’s empty because I also commented-out that pirating method.)

7 Likes

A post was merged into an existing topic: Printing of unsigned numbers

6 posts were split to a new topic: Printing of unsigned numbers

One option in general for this kind of thing is to put the piracy in your startup.jl. I don’t think it’s ever really the best solution but it’s always an option and is always better than putting it in an unrelated package.

5 Likes

Sorry to get back to this but I saw a recent reference that a 1.8RC was on the works but this PR is not merged yet.
Tried with 1.8DEV but the situation is the same (a substantial time regression).

With 1.7 I get

trees = invalidation_trees(invalidations)
insert_backedges for MethodInstance for GMT.read_data(::Dict{Symbol, Any}, ::String, ::String, ::GMTdataset{Float64, 2}, ::String, ::Bool, ::Bool)
insert_backedges for MethodInstance for GMT.get_GMTver()
SnoopCompile.MethodInvalidations[]

and with 1.9dev

trees = invalidation_trees(invalidations)
┌ Warning: Could not attribute the following delayed invalidations:
└ @ SnoopCompile C:\Users\joaqu\.julia\packages\SnoopCompile\KH4S4\src\invalidations.jl:450
Any[Tuple{Union{typeof(GMT.add_opt_pen), Char}, Dict{Symbol, Any}, Vector{Symbol}}, Tuple{Union{typeof(GMT.font), Char}, Dict{Symbol, Any}, Vector{Symbol}}, MethodInstance for show(::IOBuffer, ::UInt32), MethodInstance for show(::IOContext{IOBuffer}, ::UInt16)] invalidated Core.MethodInstance[MethodInstance for GMT.add_opt(::Dict{Symbol, Any}, ::String, ::String, ::Vector{Symbol}, ::NamedTuple{(:x, :y, :xy, :X, :Y, :asym, :colored, :cline, :csymbol, :wiskers, :cap, :pen), Tuple{String, String, String, String, String, String, String, String, String, String, String, Tuple{String, typeof(GMT.add_opt_pen)}}}, ::Bool, ::Any), MethodInstance for GMT.add_opt(::Dict{Symbol, Any}, ::String, ::String, ::Matrix{Symbol}, ::NamedTuple{(:angle, :Angle, :font, :justify, :region_justify, :header, :label, :rec_number, :text, :zvalues), Tuple{String, String, Tuple{String, typeof(GMT.font)}, Vararg{String, 7}}}, ::Bool, ::Bool), MethodInstance for Base._replace(::IOBuffer, ::Regex, ::String, ::UnitRange{Int64}, ::String), MethodInstance for Pkg.API.build(::Vector{Pkg.Types.PackageSpec})]
3-element Vector{SnoopCompile.MethodInvalidations}:
 inserting copyto!(dest::BitArray, bc::Base.Broadcast.Broadcasted{Nothing}) in Base.Broadcast at broadcast.jl:968 invalidated:
   mt_backedges: 1: signature copyto!(dest::BitArray, bc::Base.Broadcast.Broadcasted{Nothing}) in Base.Broadcast at broadcast.jl:968 (formerly copyto!(dest::AbstractArray, bc::Base.Broadcast.Broadcasted{Nothing}) in Base.Broadcast at broadcast.jl:948) triggered MethodInstance for copyto!(::BitVector, ::Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, typeof(==), Tuple{Vector{Float64}, Int64}}) (28 children)

 inserting copyto!(dest::BitArray, bc::Base.Broadcast.Broadcasted{Nothing}) in Base.Broadcast at broadcast.jl:968 invalidated:
   mt_backedges: 1: signature copyto!(dest::BitArray, bc::Base.Broadcast.Broadcasted{Nothing}) in Base.Broadcast at broadcast.jl:968 (formerly copyto!(dest::AbstractArray, bc::Base.Broadcast.Broadcasted{Nothing}) in Base.Broadcast at broadcast.jl:948) triggered MethodInstance for copyto!(::BitVector, ::Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, typeof(startswith), Tuple{Vector{String}, Base.RefValue{String}}}) (50 children)

 inserting copyto!(dest::BitArray, bc::Base.Broadcast.Broadcasted{Nothing}) in Base.Broadcast at broadcast.jl:968 invalidated:
   mt_backedges: 1: signature copyto!(dest::BitArray, bc::Base.Broadcast.Broadcasted{Nothing}) in Base.Broadcast at broadcast.jl:968 (formerly copyto!(dest::AbstractArray, bc::Base.Broadcast.Broadcasted{Nothing}) in Base.Broadcast at broadcast.jl:948) triggered MethodInstance for copyto!(::BitVector, ::Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, typeof(!), Tuple{Vector{Bool}}}) (275 children)

The GMT specific invalidations are mysterious to me. I traced them to the use of certain members of a NamedTuple that will invoke a function call but they are used in many occasions but only accused in two situations. But, even more, why are those invalidations only present in Julia >= 1.8?

I don’t see what I can do for remedy this.

Unfortunately the time regressions reported in April, pinged again about a month ago ago, were not fixed.

TTFP in 1.7

julia> tic(); plot(rand(5,2)); toc()
elapsed time: 7.5960746 seconds
7.5960746

TTFP in 1.8.0

julia> tic(); plot(rand(5,2)); toc()
elapsed time: 10.172608 seconds
10.172608

I think precompilation is broken on Plots for 1.8 (not from a julia issue). This would lead to slower TTFP. See TTFP regression since 1.7 · Issue #44293 · JuliaLang/julia · GitHub, [FR] Enable precompilation for nightly · Issue #4079 · JuliaPlots/Plots.jl · GitHub, add version check for 1.8 precompiles by JeffBezanson · Pull Request #4179 · JuliaPlots/Plots.jl · GitHub

Edit: Actually the above is wrong. Precompilation is active for Julia 1.8 for Plots.jl and TTFP (for Plots.jl) is improved with about 15%.

It’s not Plots, it’s GMT.jl as reported in the original threads.

Ah, ok.

Relevant PR that has even a backport 1.8 label.