Significant time spent in type inference in profiler flame graphs

MilesCranmer · October 22, 2022, 1:11am

I am currently profiling the search speed of SymbolicRegression.jl. In particular I am attempting to speed up this PR, which integrates the package with DynamicExpressions.jl.

I am using the following code for profiling:

using SymbolicRegression

X = randn(Float32, 5, 100)
y = 2 * cos.(X[4, :]) + X[1, :] .^ 2 .- 2

options = SymbolicRegression.Options(;
    binary_operators=(+, *, /, -),
    unary_operators=(cos, exp),
    npopulations=20,
    max_evals=30000,
)

hall_of_fame = EquationSearch(X, y; options=options, multithreading=false, numprocs=0);

@profview begin
    for i=1:10
        EquationSearch(X, y; niterations=40, options=options, multithreading=false, numprocs=0);
    end
end

which should act as a basic measure of the search speed over 10 random initializations.

Now, when I look at the flamegraph and zoom in, I see this:

Many of these calls makes sense - I expect most of the time to be spend doing the actual evaluation of expressions. However, a significant percentage of the time (~30%) is spend on type inference functions typeinf_ext_toplevel and their children.

When I click on one of these and zoom in, I am not given any answers, as none of these functions mention where in my library the type inference is being done:

So, my question is: how should I figure out where this type inference is actually occuring, so I am able to patch it, and get the (hopefully) 30% speedup?

Thanks!
Miles

jling · October 22, 2022, 1:36am

given what you’re doing (generating new functions composed of different functions), I’m not surprised, and I think this is irreducible in some sense.

I also think it’s not clear what “where” means in this case, inference is trigger by function call but may not be the immediate caller, so I’m not sure it’s well defined to ask “which line in my source code is causing inference time” especially given how currently profiler is setup.

we ran into similar problem but for GC, where the dynamic dispatch is the bottleneck and the manifestation is GC time (dynamic dispatch allocates). BUT, even after fixing that (by manually doing single dispatch), we see the GC allocation goes to 0, but barely any speed up.

MilesCranmer · October 22, 2022, 1:59am

Thanks for the reply!

Does it help if I mention that the expressions should be entirely type stable? I don’t think there should be any type inference going on in the construction or evaluation of expressions; everything should be known at compile time:

mutable struct Node{T}
    degree::Int  # 0 for constant/variable, 1 for cos/sin, 2 for +/* etc.
    constant::Bool  # false if variable
    val::Union{T,Nothing}  # If is a constant, this stores the actual value
    # ------------------- (possibly undefined below)
    feature::Int  # If is a variable (e.g., x in cos(x)), this stores the feature index.
    op::Int  # If operator, this is the index of the operator in operators.binary_operators, or operators.unary_operators
    l::Node{T}  # Left child node. Only defined for degree=1 or degree=2.
    r::Node{T}  # Right child node. Only defined for degree=2. 
end

Walking through an expression simply tells you which branch of the if statement to take, but the types are all known. e.g., you can see that the return type is explicit in this main function: https://github.com/SymbolicML/DynamicExpressions.jl/blob/c00186bcf9013d64e701d56302567407b5ed5ae8/src/EvaluateEquation.jl#L62-L64. (The type inference could also be a different part of SymbolicRegression.jl, rather than DynamicExpressions.jl, though.)

It seems strange that there should still be that much type inference here.

I’m not sure it’s well defined to ask “which line in my source code is causing inference time” especially given how currently profiler is setup.

I see. Is there any diagnostic tool I can use to figure out what part of my code has type instability, and might be causing this?

jling · October 22, 2022, 2:25am

yes! try JET.jl and Cthulhu.jl

MilesCranmer · October 22, 2022, 3:58am

Woohoo!! Type stability again! Thanks @jling

With JET.jl’s diagnostics, here’s the commit which got me 30% improvement in SymbolicRegression.jl:

In particular, I think it is the edit to get_constants - since this is used in the constant optimization loop!

Essentially, for an expression Node{T}, a constant node’s value field must be of type T. However, that field takes on a value of nothing when the node is an operator or a data column. Thus, many functions throughout the library were confused - simply telling them val::T whenever a constant node is used is enough to help them.

Topic		Replies	Views
Profiling compilation/inference Performance inference	10	1407	August 24, 2020
Measuring time of type inference Internals & Design	8	893	March 25, 2019
New tools for reducing compiler latency Package Announcements	7	4006	January 28, 2021
Look for help for weird type inference Performance question	2	478	May 14, 2022
Feature request: type assertion for compiler-known type General Usage	21	1107	April 13, 2018

Significant time spent in type inference in profiler flame graphs

Related topics