ModelingToolkit performance issue with repeated evaluations of variables in system of equations

Solving an ODEProblem created from an ODESystem with ModelingToolkit is slow when a simple time dependent variable (e.g. sin(t)) is added to the system of equations.

For example:

using ModelingToolkit
using DifferentialEquations
@variables t ρ[1:3,1:3](t) Pl(t) Pμ(t)
@parameters Ωl Ωμ δl δμ Γ32 icomplex ωlp ωμp
D = Differential(t)

H = [-δl-δμ -Pμ*Ωμ/2 0
     -Pμ*Ωμ/2 -δl -Pl*Ωl/2
     0  -Pl*Ωl/2 0]

L = [0 0 -Γ32*ρ[1,3]/2
     0 Γ32*ρ[3,3] -Γ32*ρ[2,3]/2
     -Γ32*ρ[3,1]/2 -Γ32*ρ[3,2]/2 -Γ32*ρ[3,3]]

eq = -icomplex*Symbolics.scalarize(H*ρ-ρ*H) + L
eqns = [D(ρ[idx,idy]) ~ eq[idx,idy] for idx in 1:3 for idy in 1:3]
append!(eqns, [Pl ~ (1 + sin(ωlp*t))/2])
append!(eqns, [Pμ ~ (1 + sin(ωμp*t))/2])
bloch = ODESystem(eqns)
bloch = structural_simplify(bloch)

p = [Ωl => 1, Ωμ=> 1, δl => 0., δμ => 0., Γ32 => 1., icomplex => 1im, ωlp => 1, ωμp => 1]

u0 = [ρ[idx,idy] => 0*im for idx in 1:3 for idy in 1:3]
u0[5] = ρ[2,2] => 1
tspan = (0., 300.)

prob = ODEProblem(bloch, u0, tspan, p, jac = true)
sol = solve(prob, Tsit5(), abstol = 1e-9, reltol = 1e-6, progress = false)

Manually writing out the RHS function gives:

function Lindblad_rhs!(du, ρ, p, t)
    @inbounds begin
        Ωl = p[1]
        ωpl = p[2]
        δl = p[3]
        ϕl = p[4]
        Ωμ = p[5]
        ωpμ = p[6]
        δμ = p[7]
        ϕμ = p[8]
        Γ32 = p[9]

        Pl = (1+sin(ωpl*t+ϕl))/2
        Pμ = (1+sin(ωpμ*t+ϕμ))/2

        du[1,1] = -1im*(0.5Ωμ*ρ[1, 2]*Pμ - (0.5Ωμ*ρ[2, 1]*Pμ))
        du[1,2] = -1im*(δl*ρ[1, 2] + ρ[1, 2]*(-δl - δμ) + 0.5Ωl*ρ[1, 3]*Pl + 0.5Ωμ*ρ[1, 1]*Pμ - (0.5Ωμ*ρ[2, 2]*Pμ))
        du[1,3] = -1im*(ρ[1, 3]*(-δl - δμ) + 0.5Ωl*ρ[1, 2]*Pl - (0.5Ωμ*ρ[2, 3]*Pμ)) - (0.5Γ32*ρ[1, 3])
        du[2,1] = -1im*(0.5Ωμ*ρ[2, 2]*Pμ - (δl*ρ[2, 1]) - (ρ[2, 1]*(-δl - δμ)) - (0.5Ωl*ρ[3, 1]*Pl) - (0.5Ωμ*ρ[1, 1]*Pμ))
        du[2,2] = Γ32*ρ[3, 3] - (im*(0.5Ωl*ρ[2, 3]*Pl + 0.5Ωμ*ρ[2, 1]*Pμ - (0.5Ωl*ρ[3, 2]*Pl) - (0.5Ωμ*ρ[1, 2]*Pμ)))
        du[2,3] = -1im*(0.5Ωl*ρ[2, 2]*Pl - (δl*ρ[2, 3]) - (0.5Ωl*ρ[3, 3]*Pl) - (0.5Ωμ*ρ[1, 3]*Pμ)) - (0.5Γ32*ρ[2, 3])
        du[3,1] = -1im*(0.5Ωμ*ρ[3, 2]*Pμ - (ρ[3, 1]*(-δl - δμ)) - (0.5Ωl*ρ[2, 1]*Pl)) - (0.5Γ32*ρ[3, 1])
        du[3,2] = -1im*(δl*ρ[3, 2] + 0.5Ωl*ρ[3, 3]*Pl + 0.5Ωμ*ρ[3, 1]*Pμ - (0.5Ωl*ρ[2, 2]*Pl)) - (0.5Γ32*ρ[3, 2])
        du[3,3] = -1im*(0.5Ωl*ρ[3, 2]*Pl - (0.5Ωl*ρ[2, 3]*Pl)) - (Γ32*ρ[3, 3])
    end
    nothing
end

p1 = [1.,1,0,0,1,1,0,0,1]

ρ_ini = [0im 0im 0im
         0im 1. 0im
         0im 0im 0im]
prob1 = ODEProblem(Lindblad_rhs!, ρ_ini, tspan, p1)
sol1 = solve(prob1, Tsit5(), abstol = 1e-9, reltol = 1e-6, progress = false)

@btime for sol gives 8.2 ms for my system, and for sol1 it takes only 1.7 ms, with a similar number of allocations.

What is happening (not really noticeable for this small system, but very noticeable for larger systems) is that there is some performance degradation associated with adding Pl(t) and Pμ(t) time dependence. If I set Pl ~ 1 and Pμ ~ 1 there is no performance issue.
There seems to be almost a factor 10 slowdown compared to manually writing out the RHS function while only evaluating Pl(t) and Pμ(t) once per timestep by adding Pl = Pl(t) to the RHS function and substituting Pl for each evaluation of Pl(t) in the equations.

However, the performance overhead of calling sin or cos is only a few ns, so even calling this thousands of times for a single timestep shouldn’t cause that much of a slowdown.

What are you comparing against? The comparison isn’t in the code.

Sorry about that, forgot to add the comparison code, I’ve added it to the original post.

This will be solved by a CSE pass. WIP is here: WIP: CSE by shashi · Pull Request #200 · JuliaSymbolics/SymbolicUtils.jl · GitHub