Dear All,
we have created a micro-service explaining our Mill.jl, unfortunately the service runs very slowly. Surprisingly slowly. One culprit I had in mind are precompilation times, since the data samples accepted by a single models can be o different type, since different parts of samples can be missing which makes it a different types.
I have therefore run a snoop-compile as
tinf = @snoopi_deep simulate_server(...)
and the visualize the results by
accumulate_by_source(SnoopCompile.flatten(tinf))
But I am puzzled by the results, since the most time consuming is
(1661.059199338, ROOT() in Core.Compiler.Timings at compiler/typeinfer.jl:73)
after then, most of the things are realtively fast
more of the output
(1.4325971610000003, merge(a::NamedTuple{an, T} where T<:Tuple, b::NamedTuple{bn, T} where T<:Tuple) where {an, bn} in Base at namedtuple.jl:242)
(1.4500268580000004, removeone!(f, flatmask, ii) in ExplainMill at /home/tomas.pevny/.julia/packages/ExplainMill/OI20s/src/pruning/utils.jl:113)
(1.4561869859999998, Base.Generator(f::F, iter::I) where {I, F} in Base at generator.jl:32)
(1.503079966000002, Base.Generator{I, F}(f, iter) where {I, F} in Base at generator.jl:32)
(1.5185066880000004, select_rule(s::ServerCommons.AbstractSource, fs::ClusteringServer.RuleGen.FatSample, fc::ClusteringServer.RuleGen.FatClass, rules::Vector{T} where T) in ClusteringServer.RuleGen at /home/tomas.pevny/julia/Avast/hmil-server/ClusteringServer/src/rulegen/rulegen.jl:216)
(1.559974971, getindex(ds::typename(ProductNode){…}, mk::ExplainMill.ProductMask, presentobs) where {T<:NamedTuple, M} in ExplainMill at /home/tomas.pevny/.julia/packages/ExplainMill/OI20s/src/masks/nodemasks/product.jl:44)
(1.6183364590000007, indexed_iterate(t::Tuple, i::Int64, state) in Base at tuple.jl:86)
(1.6234585009999987, set(obj, l::Setfield.PropertyLens{field}, val) where field in Setfield at /home/tomas.pevny/.julia/packages/Setfield/NshXm/src/lens.jl:108)
(1.6596654549999998, addone!(f, flatmask) in ExplainMill at /home/tomas.pevny/.julia/packages/ExplainMill/OI20s/src/pruning/utils.jl:87)
(1.7985840000000002, var"#baseline_generate#11"(max_rules, max_time, kwargs, ::typeof(ClusteringServer.RuleGen.baseline_generate), s::ServerCommons.AbstractSource, fm::ClusteringServer.RuleGen.FatModel, fs::ClusteringServer.RuleGen.FatSample, fc::ClusteringServer.RuleGen.FatClass) in ClusteringServer.RuleGen at /home/tomas.pevny/julia/Avast/hmil-server/ClusteringServer/src/rulegen/rulegen.jl:113)
(1.863511771, var"#mapreducedim!#319"(init, ::typeof(GPUArrays.mapreducedim!), f::F, op::OP, R::CUDA.AnyCuArray{T, N} where N, A::Union{Base.Broadcast.Broadcasted, AbstractArray}) where {F, OP, T} in CUDA at /home/tomas.pevny/.julia/packages/CUDA/kKJoe/src/mapreduce.jl:140)
(1.9225557329999985, (::Mill.var"#95#96")(x) in Mill at /home/tomas.pevny/.julia/packages/Mill/GN4LN/src/datanodes/productnode.jl:6)
(1.9225557329999985, (::Mill.var"#95#96")(x) in Mill at /home/tomas.pevny/.julia/packages/Mill/GN4LN/src/datanodes/productnode.jl:6)
(2.460045623999997, in(x, itr) in Base at operators.jl:1129)
(2.5342526809999995, map(f, A::AbstractArray) in Base at abstractarray.jl:2294)
(2.5873207800000007, _collect(c, itr, ::Base.EltypeUnknown, isz::Union{Base.HasLength, Base.HasShape}) in Base at array.jl:690)
(2.740906448000001, iterate(g::Base.Generator, s...) in Base at generator.jl:42)
(3.1267156070000013, var"#levelbylevelsearch!#293"(levelsearch!, fine_tuning::Bool, random_removal::Bool, ::typeof(ExplainMill.levelbylevelsearch!), f, mk::ExplainMill.AbstractStructureMask) in ExplainMill at /home/tomas.pevny/.julia/packages/ExplainMill/OI20s/src/pruning/levelbylevel.jl:65)
(3.234713302000001, collect(itr::Base.Generator) in Base at array.jl:672)
(3.289312171999996, getproperty(x, f::Symbol) in Base at Base.jl:33)
(4.185053459, show_delim_array(io::IO, itr::Union{Core.SimpleVector, AbstractArray}, op, delim, cl, delim_one, i1, l) in Base at show.jl:1079) (4.503720626, _adjustmask(mk, source::Symbol) in SampleTweakers at /home/tomas.pevny/.julia/packages/SampleTweakers/tfy3D/src/explainmill.jl:103)
(4.817237094999997, env_override_minlevel(group, _module) in Base.CoreLogging at logging.jl:519)
(5.06188595, eltype(x) in Base at abstractarray.jl:187)
(5.173634224000002, setproperty!(x, f::Symbol, v) in Base at Base.jl:34)
(5.555816859999996, collect_to_with_first!(dest::AbstractArray, v1, itr, st) in Base at array.jl:699)
(6.172039743, run_inference(model::ClusteringServer.RuleGen.FatModel, sample::ClusteringServer.RuleGen.FatSample) in ClusteringServer.RuleGen at /home/tomas.pevny/julia/Avast/hmil-server/ClusteringServer/src/rulegen/fat_model.jl:25)
(8.950478434000004, _adjustmask(mk, ituples, lead_follow) in SampleTweakers at /home/tomas.pevny/.julia/packages/SampleTweakers/tfy3D/src/explainmill.jl:74)
(13.634319394000002, collect_to!(dest::AbstractArray{T, N} where N, itr, offs, st) where T in Base at array.jl:719)
(14.870875715, (m::typename(ProductModel){…})(x::typename(ProductNode){…}) where {P<:NamedTuple, T, MS<:NamedTuple, M} in Mill at /home/tomas.pevny/.julia/packages/Mill/GN4LN/src/modelnodes/productmodel.jl:30)
(21.661553732, setindex_widen_up_to(dest::AbstractArray{T, N} where N, el, i) where T in Base at array.jl:710)
(1661.059199338, ROOT() in Core.Compiler.Timings at compiler/typeinfer.jl:73)
It is true that the precompilation list is long
julia> length(SnoopCompile.flatten(tinf))
292724
so it might well be that all my precompilation sums to 1661s, which is the top root and then the rest sums to it. But it does not sum up, as
julia> finf = SnoopCompile.flatten(tinf);
julia> sum(r.exclusive_time for r in finf[1:end-1])
231.98690968600127
julia> finf[end].exclusive_time
1661.059199338
which means I am loosing somewhere 1430s
, which is quite a bit to my taste. In the server, we use a lot of @elapsed
macro to measure the time. I wonder if this cannot somehow interfere with the type inference, but as I understand, macro should not have such effect.
I am very confused by this any hint is very appreciated.