Very Slow Compilation for Sensitivity Analysis

I’m working on an application that performs optimization of a complex ODE by iteratively optimizing over the linearized system. The optimization works fine, but sensitivity analysis of the ODE has very strange performance behaviour. Currently, I’m doing this via applying ForwardDiff to a DifferentialEquations ODEProblem. Once compilation has happened, Julia’s very fast, but issues arise on the first run. The program can be found here.

On the first run, the program can take more than 90 seconds on fast hardware (3GHz i7) to compile. This only happens when trying to differentiate the integrator - the ODE itself is fast to compile - and once compiled, performance is very good. Analysis of precompilation time with SnoopCompile indicates some abysmally long compile times for specific functions, with the following taking more than 7 seconds:

Base.Math.muladd(typeof(Base.muladd), StaticArrays.SArray{Tuple{14}, ForwardDiff.Dual{ForwardDiff.Tag{getfield(Dynamics, Symbol(\"#f#7\")){Main.ProbInfo, Float64}, Float64}, Float64, 21}, 1, 14}, ForwardDiff.Dual{ForwardDiff.Tag{getfield(Dynamics, Symbol(\"#f#7\")){Main.ProbInfo, Float64}, Float64}, Float64, 21}, StaticArrays.SArray{Tuple{14}, ForwardDiff.Dual{ForwardDiff.Tag{getfield(Dynamics, Symbol(\"#f#7\")){Main.ProbInfo, Float64}, Float64}, Float64, 21}, 1, 14})

I’ve also tried straight profiling, but the program fails to terminate when running under the profiler.

The long compile time makes testing the program unpleasant, and it would be nice if there was a way to redesign it to have a faster compile while not sacrificing compiled performance. I’ve had a few ideas:

  • Stop using SArray, which should improve performance of the inliner and constant propagator at the expense of runtime speed. Could recover performance from very careful use of in place modification.
  • Use DifferentialEquation’s sensitivity analysis instead of ForwardDiff on a normal ODE. However, the function being integrated is not amenable to symbolic differentiation in the general case, and the documentation isn’t entirely clear on how to define it completely numerically.

I feel like I’m running into a particularly degenerate case with this program, and suspect that there’s something minor I could change to solve this problem. Is there any easy way to improve the compile performance? Precompilation would work for the deployed version of the program, but does not solve the development-time compile performance problem.

Yeah, this is known. Hessian of 4x4 matrix multiplication using StaticArrays brings Julia to its knees · Issue #266 · JuliaDiff/ForwardDiff.jl · GitHub . If you’re doing a long optimization (global parameter estimation) then this is of course a fine way to do it since compilation is only paid once, but it’s not optimal. As JIT compilers improve this will be a use case to watch though.

If you can’t do symbolic (we will have better tooling for this in ModelingToolkit.jl by the end of summer, cc @chakravala ), then the numerical tools should still work. The example is right here:

http://docs.juliadiffeq.org/latest/analysis/sensitivity.html#Example-solving-an-ODELocalSensitivityProblem-1

and you can replace that @ode_def use with just a normal Julia function and it’ll do fine. Internally it’ll compute the necessary Jacobians via autodiff, so if your derivative function is autodiff compatible (it must be already) then this’ll just work (and you can speed it up by defining Jacobians and the like).

If you’re having trouble interpreting the documentation then just let me know.

3 Likes

On one problem I’m working on that uses a lot of ForwardDiff hessians (no StaticArrays though), compile time in 0.6 is about 150 seconds. In 0.7 it’s 12.
(Runtime is also around 20% faster on 0.7.)

So, things are definitely improving for ForwardDiff.

3 Likes

That’s good to hear. For this case the user probably shouldn’t be doing ForwardDiff through the ODE solver anyways, but if it turns out not being bad in a future version of Julia that’s always a plus :smile:.