Significant time spent in type inference in profiler flame graphs

I am currently profiling the search speed of SymbolicRegression.jl. In particular I am attempting to speed up this PR, which integrates the package with DynamicExpressions.jl.

I am using the following code for profiling:

using SymbolicRegression

X = randn(Float32, 5, 100)
y = 2 * cos.(X[4, :]) + X[1, :] .^ 2 .- 2

options = SymbolicRegression.Options(;
    binary_operators=(+, *, /, -),
    unary_operators=(cos, exp),

hall_of_fame = EquationSearch(X, y; options=options, multithreading=false, numprocs=0);

@profview begin
    for i=1:10
        EquationSearch(X, y; niterations=40, options=options, multithreading=false, numprocs=0);

which should act as a basic measure of the search speed over 10 random initializations.

Now, when I look at the flamegraph and zoom in, I see this:

Many of these calls makes sense - I expect most of the time to be spend doing the actual evaluation of expressions. However, a significant percentage of the time (~30%) is spend on type inference functions typeinf_ext_toplevel and their children.

When I click on one of these and zoom in, I am not given any answers, as none of these functions mention where in my library the type inference is being done:

So, my question is: how should I figure out where this type inference is actually occuring, so I am able to patch it, and get the (hopefully) 30% speedup?


given what you’re doing (generating new functions composed of different functions), I’m not surprised, and I think this is irreducible in some sense.

I also think it’s not clear what “where” means in this case, inference is trigger by function call but may not be the immediate caller, so I’m not sure it’s well defined to ask “which line in my source code is causing inference time” especially given how currently profiler is setup.

we ran into similar problem but for GC, where the dynamic dispatch is the bottleneck and the manifestation is GC time (dynamic dispatch allocates). BUT, even after fixing that (by manually doing single dispatch), we see the GC allocation goes to 0, but barely any speed up.

Thanks for the reply!

Does it help if I mention that the expressions should be entirely type stable? I don’t think there should be any type inference going on in the construction or evaluation of expressions; everything should be known at compile time:

mutable struct Node{T}
    degree::Int  # 0 for constant/variable, 1 for cos/sin, 2 for +/* etc.
    constant::Bool  # false if variable
    val::Union{T,Nothing}  # If is a constant, this stores the actual value
    # ------------------- (possibly undefined below)
    feature::Int  # If is a variable (e.g., x in cos(x)), this stores the feature index.
    op::Int  # If operator, this is the index of the operator in operators.binary_operators, or operators.unary_operators
    l::Node{T}  # Left child node. Only defined for degree=1 or degree=2.
    r::Node{T}  # Right child node. Only defined for degree=2. 

Walking through an expression simply tells you which branch of the if statement to take, but the types are all known. e.g., you can see that the return type is explicit in this main function: DynamicExpressions.jl/EvaluateEquation.jl at c00186bcf9013d64e701d56302567407b5ed5ae8 · SymbolicML/DynamicExpressions.jl · GitHub. (The type inference could also be a different part of SymbolicRegression.jl, rather than DynamicExpressions.jl, though.)

It seems strange that there should still be that much type inference here.

I’m not sure it’s well defined to ask “which line in my source code is causing inference time” especially given how currently profiler is setup.

I see. Is there any diagnostic tool I can use to figure out what part of my code has type instability, and might be causing this?

1 Like

yes! try JET.jl and Cthulhu.jl

1 Like

Woohoo!! Type stability again! Thanks @jling :100:

With JET.jl’s diagnostics, here’s the commit which got me 30% improvement in SymbolicRegression.jl:

In particular, I think it is the edit to get_constants - since this is used in the constant optimization loop!

Essentially, for an expression Node{T}, a constant node’s value field must be of type T. However, that field takes on a value of nothing when the node is an operator or a data column. Thus, many functions throughout the library were confused - simply telling them val::T whenever a constant node is used is enough to help them.