Measuring time of type inference


My package Strided.jl seems to give type inference a very hard time, which becomes even worse in packages that depend on it (like TensorOperations.jl).

I know about packages like SnoopCompile.jl and PackageCompiler.jl, but rather than just trying to precompile every possible combination of arguments (which is essentially hopeless, and also does not solve the full problem), I would like to redesign certain parts such that they are more friendly to type inference. Hence, I would like to explore where the problems lie and perform some time measurements of the type inference process.

Is there a recommended workflow for this? I seem to find little documentation, also about what the individual methods (typeinf_ext, typeinf_code, typeinf_type, typeinf_edge, typeinf) do. There seems to be some utility in Core.Compiler, a macro @timeit which does not seem to do anything? How is this supposed to be used?

I’ve read the Inference page in the dev docs and the two blog posts linked to, but this does not help me with such practical questions.



Tim Holy just created an issue where he got crashes while trying to time type inference:

May give some ideas.

Are you sure type inference is the problem, and not some other stage in the compilation pipeline?
Are all your functions type stable? What’s the code doing?

I have a package that took a minute to compile. I’m honestly not quite sure why (so I’m not an expert / someone actually able to help). I haven’t had the time to go back and take a look at why*. I’ll probably refactor everything instead, which I expect to fix the problems.

*But my suspicion is because I was lazy, and had a lot of structs like

struct SomeStruct{A,B,C,D,E,F,G}

where those type parameters themselves may be indecently nested, eg Something{Something{ForwardDiff.Dual{Tag{somefunction}...}}}.
So maybe it did spend most of the time on inference.

Sorry for rambling. You didn’t provide much info to go off of, so I thought I’d jump in with my own experience with slow compilation.



Thanks, this is certainly helpful. I am quite certain that it is type inference though.



It could be a combination of type inference and type stability. If you have parametric types, you can improve the performance of everything by using bits types and @pure for type parameters, where appropriate. I did this in Grassmann.jl, this technique gives a significant TensorAlgebra performance boost for me.



Frivolous use of @pure is not recommended since incorrect use will give your program undefined behaviour. It should also not impact type inference time which is the question here.



Thanks for all the responses. I have indeed parametric types, yet they cannot be bitstypes as they wrap Arrays. I modified typeinf_ext to print out some timing results (not in the clever way of Tim Holy), but would like some more detailed statistics of the type inference process, e.g. also get timings for all the functions which are inferred down the chain and how long different parts take, to really see what specifically is causing the issue. Not sure which of the functions I need to put timers in for that.



Only use it of you are able to make sense of it, so I should perhaps not recommend it. All I’m saying is it is a technical performance option available.

It does actually impact type inference, although I am not sure if it affects the timings.

You can still use bits types anyway. I did this in DirectSum.jl, where I needed Array parameters, where i was able to store it in a cache and then use a bits integer type to parametrize.



Which is why I said “timings”.



Another way to time inference might be the @timeit macros in base/compiler. They don’t do anything by default:

if !isdefined(@__MODULE__, Symbol("@timeit"))
    # This is designed to allow inserting timers when loading a second copy
    # of inference for performing performance experiments.
    macro timeit(args...)

NotInferenceDontLookHere.jl uses the mechanism to create a second copy of the compiler with timings enabled, but it seems to have bitrotten (or at least, it’s incompatible with latest TimerOutputs).

There’s also the ENABLE_TIMINGS compile-time flag, but that isn’t very useful for timing inference.