Measuring time of type inference

juthohaegeman · March 22, 2019, 8:17am

My package Strided.jl seems to give type inference a very hard time, which becomes even worse in packages that depend on it (like TensorOperations.jl).

I know about packages like SnoopCompile.jl and PackageCompiler.jl, but rather than just trying to precompile every possible combination of arguments (which is essentially hopeless, and also does not solve the full problem), I would like to redesign certain parts such that they are more friendly to type inference. Hence, I would like to explore where the problems lie and perform some time measurements of the type inference process.

Is there a recommended workflow for this? I seem to find little documentation, also about what the individual methods (typeinf_ext, typeinf_code, typeinf_type, typeinf_edge, typeinf) do. There seems to be some utility in Core.Compiler, a macro @timeit which does not seem to do anything? How is this supposed to be used?

I’ve read the Inference page in the dev docs and the two blog posts linked to, but this does not help me with such practical questions.

Elrod · March 22, 2019, 8:29am

Tim Holy just created an issue where he got crashes while trying to time type inference:

github.com/JuliaLang/julia

Subverting type inference?

opened 03:04PM - 21 Mar 19 UTC

closed 01:23PM - 28 Mar 19 UTC

timholy

I'm trying some dirty tricks and getting crashes. Here the goal is to measure ho…w much time inference spends on each function: my thought was I could define ```julia const __inf_timing__ = Tuple{Float64,Core.MethodInstance}[] function typeinf_ext_timed(linfo::Core.MethodInstance, params::Core.Compiler.Params) tstart = ccall(:jl_clock_now, Float64, ()) ret = Core.Compiler.typeinf_ext(linfo, params) tstop = ccall(:jl_clock_now, Float64, ()) push!(__inf_timing__, (tstop-tstart, linfo)) return ret end ``` and then be really sneaky and swap this in place of `typeinf_ext`: ```julia macro snoopi(args...) # some preparatory work quote empty!($__inf_timing__) ccall(:jl_set_typeinf_func, Cvoid, (Any,), $typeinf_ext_timed) try $(esc(cmd)) finally ccall(:jl_set_typeinf_func, Cvoid, (Any,), Core.Compiler.typeinf_ext) end $sort_timed_inf($tmin) end end ``` Even though I make sure `typeinf_ext_timed` is compiled before it gets called, I get a crash when I first switch to it: ``` Internal error: encountered unexpected error in runtime: MethodError(f=typeof(SnoopCompile.typeinf_ext_timed)(), args=((::Type{BoundsError})(Any, Tuple{Base.IteratorsMD.CartesianIndex{0}}), 0x00000000000063e2), world=0x00000000000063e1) rec_backtrace at /home/tim/src/julia-1/src/stackwalk.c:94 record_backtrace at /home/tim/src/julia-1/src/task.c:217 [inlined] jl_throw at /home/tim/src/julia-1/src/task.c:417 jl_method_error_bare at /home/tim/src/julia-1/src/gf.c:1649 jl_method_error at /home/tim/src/julia-1/src/gf.c:1667 jl_apply_generic at /home/tim/src/julia-1/src/gf.c:2195 jl_apply at /home/tim/src/julia-1/src/julia.h:1571 [inlined] jl_type_infer at /home/tim/src/julia-1/src/gf.c:277 jl_set_typeinf_func at /home/tim/src/julia-1/src/gf.c:558 top-level scope at /home/tim/.julia/dev/SnoopCompile/src/SnoopCompile.jl:52 jl_fptr_trampoline at /home/tim/src/julia-1/src/gf.c:1864 jl_toplevel_eval_flex at /home/tim/src/julia-1/src/toplevel.c:758 jl_parse_eval_all at /home/tim/src/julia-1/src/ast.c:883 jl_load at /home/tim/src/julia-1/src/toplevel.c:826 include at ./boot.jl:326 [inlined] include_relative at ./loading.jl:1038 include at ./sysimg.jl:29 jl_apply_generic at /home/tim/src/julia-1/src/gf.c:2219 exec_options at ./client.jl:267 _start at ./client.jl:436 jl_apply_generic at /home/tim/src/julia-1/src/gf.c:2219 jl_apply at /home/tim/src/julia-1/ui/../src/julia.h:1571 [inlined] true_main at /home/tim/src/julia-1/ui/repl.c:96 main at /home/tim/src/julia-1/ui/repl.c:217 __libc_start_main at /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310 _start at /home/tim/src/julia-1/julia (unknown line) Internal error: encountered unexpected error in runtime: MethodError(f=typeof(SnoopCompile.typeinf_ext_timed)(), args=((::Type{BoundsError})(Any, Base.LinearIndices{1, Tuple{Base.OneTo{Int64}}}), 0x00000000000063e2), world=0x00000000000063e1) ... ``` This happens independently of whether I try to be really sneaky and set the `min_world` on the specialization of `typeinf_ext_timed` to 0. Is there a workaround? Another (potentially important) application for this general idea is https://github.com/JuliaDebug/JuliaInterpreter.jl/pull/204.

May give some ideas.

Are you sure type inference is the problem, and not some other stage in the compilation pipeline?
Are all your functions type stable? What’s the code doing?

I have a package that took a minute to compile. I’m honestly not quite sure why (so I’m not an expert / someone actually able to help). I haven’t had the time to go back and take a look at why*. I’ll probably refactor everything instead, which I expect to fix the problems.

*But my suspicion is because I was lazy, and had a lot of structs like

struct SomeStruct{A,B,C,D,E,F,G}
    a::A
    b::B
    c::C
    d::D
    e::E
    f::F
    g::G
end

where those type parameters themselves may be indecently nested, eg Something{Something{ForwardDiff.Dual{Tag{somefunction}...}}}.
So maybe it did spend most of the time on inference.

Sorry for rambling. You didn’t provide much info to go off of, so I thought I’d jump in with my own experience with slow compilation.

juthohaegeman · March 22, 2019, 8:04pm

Thanks, this is certainly helpful. I am quite certain that it is type inference though.

chakravala · March 23, 2019, 6:40pm

It could be a combination of type inference and type stability. If you have parametric types, you can improve the performance of everything by using bits types and @pure for type parameters, where appropriate. I did this in Grassmann.jl, this technique gives a significant TensorAlgebra performance boost for me.

kristoffer.carlsson · March 23, 2019, 9:47pm

Frivolous use of @pure is not recommended since incorrect use will give your program undefined behaviour. It should also not impact type inference time which is the question here.

juthohaegeman · March 24, 2019, 8:05am

Thanks for all the responses. I have indeed parametric types, yet they cannot be bitstypes as they wrap Arrays. I modified typeinf_ext to print out some timing results (not in the clever way of Tim Holy), but would like some more detailed statistics of the type inference process, e.g. also get timings for all the functions which are inferred down the chain and how long different parts take, to really see what specifically is causing the issue. Not sure which of the functions I need to put timers in for that.

chakravala · March 24, 2019, 12:44pm

Only use it of you are able to make sense of it, so I should perhaps not recommend it. All I’m saying is it is a technical performance option available.

It does actually impact type inference, although I am not sure if it affects the timings.

You can still use bits types anyway. I did this in DirectSum.jl, where I needed Array parameters, where i was able to store it in a cache and then use a bits integer type to parametrize.

kristoffer.carlsson · March 24, 2019, 1:50pm

Which is why I said “timings”.

maleadt · March 25, 2019, 5:59am

Another way to time inference might be the @timeit macros in base/compiler. They don’t do anything by default:

if !isdefined(@__MODULE__, Symbol("@timeit"))
    # This is designed to allow inserting timers when loading a second copy
    # of inference for performing performance experiments.
    macro timeit(args...)
        esc(args[end])
    end
end

NotInferenceDontLookHere.jl uses the mechanism to create a second copy of the compiler with timings enabled, but it seems to have bitrotten (or at least, it’s incompatible with latest TimerOutputs).

There’s also the ENABLE_TIMINGS compile-time flag, but that isn’t very useful for timing inference.

Topic		Replies	Views
Profiling compilation/inference Performance inference	10	1405	August 24, 2020
Start-up performance, types and compiler's type inference Performance	5	1107	September 27, 2018
New tools for reducing compiler latency Package Announcements	7	4006	January 28, 2021
Bad performance for dispatch-heavy code Performance question	12	1687	September 20, 2019
Significant time spent in type inference in profiler flame graphs Performance question , profiling	4	550	October 22, 2022

Measuring time of type inference

Related topics