Understanding the compile times of DifferentialEquations.jl and Attempting to Help

Hey, I am trying to find out if I can help with compile time issues by doing a bunch of profiling on DiffEq. This has been coming up quite a bit and @Datseris keeps bugging me about it, so I am trying to find out what I can do. It was mentioned to me in the Slack that I can profile the timings for compiling specific function signatures using SnoopCompile.jl since it’s snoop data returns a column of times. Using that, I starting doing some profiling on a simple ODE call:

using SnoopCompile

SnoopCompile.@snoop "compiles.csv" begin
  using OrdinaryDiffEq
  function f(du,u,p,t)
    du[1] = p[1]*(u[2]-u[1])
    du[2] = u[1]*(p[2]-u[3]) - u[2]
    du[3] = u[1]*u[2] - p[3]*u[3]
  end
  u0 = [1.0,0.0,0.0]
  tspan = (0.0,1.0)
  p = (10.0,28.0,8/3)
  prob = ODEProblem(f,u0,tspan,p)
  Base.GC.gc()
  sol = solve(prob,Tsit5())
end

At first I did timing on decrease standard allocs · SciML/OrdinaryDiffEq.jl@0867ad8 · GitHub . This was compiles.csv which can be found here: compiles.csv · GitHub (maybe there’s a way to make Gists handle CSVs better?). Then I started knocking out top compilation hits inside of OrdinaryDiffEq one at a time, leading to the compilesX.csv files in the Gist. This was mostly to start highlighting the contribution due to different parts of OrdinaryDiffEq.jl, the perform_step! function which is the core, and how much of it was due to things in Base.

Actionable Results

Some things that did pop up as very high on the timings were

Tuple{typeof(Base.unsafe_copyto!), Array{Float64, 1}, Int64, Array{Float64, 1}, Int64, Int64}

and

Tuple{typeof(Base.throw_boundserror), Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, typeof(Base.muladd), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(DiffEqBase.ODE_DEFAULT_NORM)

For the former, it only showed up in one or two of the CSVs near the top (when you sort it by the time column) so I’m not sure if it’s noise, but that is a potential call that could be added to the Base system image? The second one is the boundserror part of the broadcast machinery. @mbauman mentioned that these could be greatly reduced by @nospecializeing them.

Now, this is my first foray into this so and what I got is noisy and I still need some help making this more refined. Please guide me to how I can be a helpful source of data here.

8 Likes

It’s been a while, but if memory serves SnoopCompile’s timings only measure the “core compiler” (IR generation & LLVM’s native code generation) and omit the time needed for inference. I think to really start helping on the compile time we’re going to have to develop infrastructure to measure both. I’m not sure if there are good existing tools, but we might want to consider adding a bit of instrumentation to key entry points to inference (e.g., https://github.com/JuliaLang/julia/blob/90e3155fc41ee9cf6ebefe1aeb2cb77d7c37bcfa/base/compiler/typeinfer.jl#L462).