Making sense of output of Snoopcompile

Dear All,

we have created a micro-service explaining our Mill.jl, unfortunately the service runs very slowly. Surprisingly slowly. One culprit I had in mind are precompilation times, since the data samples accepted by a single models can be o different type, since different parts of samples can be missing which makes it a different types.

I have therefore run a snoop-compile as

tinf = @snoopi_deep simulate_server(...)

and the visualize the results by

accumulate_by_source(SnoopCompile.flatten(tinf))

But I am puzzled by the results, since the most time consuming is

 (1661.059199338, ROOT() in Core.Compiler.Timings at compiler/typeinfer.jl:73)

after then, most of the things are realtively fast

more of the output
 (1.4325971610000003, merge(a::NamedTuple{an, T} where T<:Tuple, b::NamedTuple{bn, T} where T<:Tuple) where {an, bn} in Base at namedtuple.jl:242)
 (1.4500268580000004, removeone!(f, flatmask, ii) in ExplainMill at /home/tomas.pevny/.julia/packages/ExplainMill/OI20s/src/pruning/utils.jl:113)
 (1.4561869859999998, Base.Generator(f::F, iter::I) where {I, F} in Base at generator.jl:32)
 (1.503079966000002, Base.Generator{I, F}(f, iter) where {I, F} in Base at generator.jl:32)
 (1.5185066880000004, select_rule(s::ServerCommons.AbstractSource, fs::ClusteringServer.RuleGen.FatSample, fc::ClusteringServer.RuleGen.FatClass, rules::Vector{T} where T) in ClusteringServer.RuleGen at /home/tomas.pevny/julia/Avast/hmil-server/ClusteringServer/src/rulegen/rulegen.jl:216)
 (1.559974971, getindex(ds::typename(ProductNode){…}, mk::ExplainMill.ProductMask, presentobs) where {T<:NamedTuple, M} in ExplainMill at /home/tomas.pevny/.julia/packages/ExplainMill/OI20s/src/masks/nodemasks/product.jl:44)
 (1.6183364590000007, indexed_iterate(t::Tuple, i::Int64, state) in Base at tuple.jl:86)
 (1.6234585009999987, set(obj, l::Setfield.PropertyLens{field}, val) where field in Setfield at /home/tomas.pevny/.julia/packages/Setfield/NshXm/src/lens.jl:108)
 (1.6596654549999998, addone!(f, flatmask) in ExplainMill at /home/tomas.pevny/.julia/packages/ExplainMill/OI20s/src/pruning/utils.jl:87)
 (1.7985840000000002, var"#baseline_generate#11"(max_rules, max_time, kwargs, ::typeof(ClusteringServer.RuleGen.baseline_generate), s::ServerCommons.AbstractSource, fm::ClusteringServer.RuleGen.FatModel, fs::ClusteringServer.RuleGen.FatSample, fc::ClusteringServer.RuleGen.FatClass) in ClusteringServer.RuleGen at /home/tomas.pevny/julia/Avast/hmil-server/ClusteringServer/src/rulegen/rulegen.jl:113)
 (1.863511771, var"#mapreducedim!#319"(init, ::typeof(GPUArrays.mapreducedim!), f::F, op::OP, R::CUDA.AnyCuArray{T, N} where N, A::Union{Base.Broadcast.Broadcasted, AbstractArray}) where {F, OP, T} in CUDA at /home/tomas.pevny/.julia/packages/CUDA/kKJoe/src/mapreduce.jl:140)
 (1.9225557329999985, (::Mill.var"#95#96")(x) in Mill at /home/tomas.pevny/.julia/packages/Mill/GN4LN/src/datanodes/productnode.jl:6)
 (1.9225557329999985, (::Mill.var"#95#96")(x) in Mill at /home/tomas.pevny/.julia/packages/Mill/GN4LN/src/datanodes/productnode.jl:6)
 (2.460045623999997, in(x, itr) in Base at operators.jl:1129)
 (2.5342526809999995, map(f, A::AbstractArray) in Base at abstractarray.jl:2294)
 (2.5873207800000007, _collect(c, itr, ::Base.EltypeUnknown, isz::Union{Base.HasLength, Base.HasShape}) in Base at array.jl:690)
 (2.740906448000001, iterate(g::Base.Generator, s...) in Base at generator.jl:42)
 (3.1267156070000013, var"#levelbylevelsearch!#293"(levelsearch!, fine_tuning::Bool, random_removal::Bool, ::typeof(ExplainMill.levelbylevelsearch!), f, mk::ExplainMill.AbstractStructureMask) in ExplainMill at /home/tomas.pevny/.julia/packages/ExplainMill/OI20s/src/pruning/levelbylevel.jl:65)
 (3.234713302000001, collect(itr::Base.Generator) in Base at array.jl:672)
 (3.289312171999996, getproperty(x, f::Symbol) in Base at Base.jl:33)
 (4.185053459, show_delim_array(io::IO, itr::Union{Core.SimpleVector, AbstractArray}, op, delim, cl, delim_one, i1, l) in Base at show.jl:1079) (4.503720626, _adjustmask(mk, source::Symbol) in SampleTweakers at /home/tomas.pevny/.julia/packages/SampleTweakers/tfy3D/src/explainmill.jl:103)
 (4.817237094999997, env_override_minlevel(group, _module) in Base.CoreLogging at logging.jl:519)
 (5.06188595, eltype(x) in Base at abstractarray.jl:187)
 (5.173634224000002, setproperty!(x, f::Symbol, v) in Base at Base.jl:34)
 (5.555816859999996, collect_to_with_first!(dest::AbstractArray, v1, itr, st) in Base at array.jl:699)
 (6.172039743, run_inference(model::ClusteringServer.RuleGen.FatModel, sample::ClusteringServer.RuleGen.FatSample) in ClusteringServer.RuleGen at /home/tomas.pevny/julia/Avast/hmil-server/ClusteringServer/src/rulegen/fat_model.jl:25)
 (8.950478434000004, _adjustmask(mk, ituples, lead_follow) in SampleTweakers at /home/tomas.pevny/.julia/packages/SampleTweakers/tfy3D/src/explainmill.jl:74)
 (13.634319394000002, collect_to!(dest::AbstractArray{T, N} where N, itr, offs, st) where T in Base at array.jl:719)
 (14.870875715, (m::typename(ProductModel){…})(x::typename(ProductNode){…}) where {P<:NamedTuple, T, MS<:NamedTuple, M} in Mill at /home/tomas.pevny/.julia/packages/Mill/GN4LN/src/modelnodes/productmodel.jl:30)
 (21.661553732, setindex_widen_up_to(dest::AbstractArray{T, N} where N, el, i) where T in Base at array.jl:710)
 (1661.059199338, ROOT() in Core.Compiler.Timings at compiler/typeinfer.jl:73)

It is true that the precompilation list is long

julia> length(SnoopCompile.flatten(tinf))
292724

so it might well be that all my precompilation sums to 1661s, which is the top root and then the rest sums to it. But it does not sum up, as

julia> finf = SnoopCompile.flatten(tinf);
julia> sum(r.exclusive_time for r in finf[1:end-1])
231.98690968600127

julia> finf[end].exclusive_time
1661.059199338

which means I am loosing somewhere 1430s, which is quite a bit to my taste. In the server, we use a lot of @elapsed macro to measure the time. I wonder if this cannot somehow interfere with the type inference, but as I understand, macro should not have such effect.

I am very confused by this any hint is very appreciated.