Discrepancy of ODE Sensitivity Analysis paper results with benchmarks

Can you make a PR? That’s the easiest way to explain the change

Yes, I can. Here I have also a comparison for N=2:8.

Unchanged code:

Fixed code:

Code fix:

csa = map(csan) do n
  bfun, b_u0, b_p, brusselator_jac, brusselator_comp = makebrusselator!(PROBS, n)
  @time ts = map(ADJOINT_METHODS[1:2end÷3]) do alg
    @info "Running $alg ($(SciMLSensitivity.alg_autodiff(alg) ? "AD" : "user")-jac) for N = $n"
    f = SciMLSensitivity.alg_autodiff(alg) ? bfun : ODEFunction(bfun, jac=brusselator_jac)
    solver = Rodas5(autodiff=false)
    @time diffeq_sen_l2(f, b_u0, tspan, b_p, bt, solver; sensalg=alg, tols...)
    t = @elapsed diffeq_sen_l2(f, b_u0, tspan, b_p, bt, solver; sensalg=alg, tols...)
    return t
  end
  @show n,ts
  ts
end

The difference is huge, since now Quadrature CASA user-Jacobian is the fastest method and previously was the 3rd slowest. And it make sense that it should be the fastest, since computing user-Jacobian is much easier than running autodiff to get the Jacobian.

No, it shouldn’t be in general because it requires constructing the whole Jacobian is required for the user one vs just the vjp. But the key here is that previous versions would not have specialized on bandedness and sparsity, which it would not have done before the DI change

I have created the PR

Could you please expand on that? I am not sure if I understand it correctly, especially what do you mean by “previous versions”?

To my understanding, currently in SciMLBenchmarks there is no specialization regarding sparsity since jac_prototype is not provided i.e. everything is done on dense matrices. Furthermore adding sparsity with the use of jac_prototype yields better results, which means that the code where jac_prototype was not set, did not specialize.

Here is my implementation of adding sparsity to the SciMLBenchmark:

function jac_sparsity_adtypes(u0, func, p)
    du0 = similar(u0)
    return ADTypes.jacobian_sparsity((du, u) -> func(du, u, p, 0.0), du0, u0, TracerSparsityDetector())
end

function auto_sen_l2(f, u0, tspan, p, t, alg=Tsit5(); diffalg=ReverseDiff.gradient, kwargs...)
    jac_sparsity = jac_sparsity_adtypes(u0, f, p)
    f_ode = ODEFunction(f, jac_prototype=jac_sparsity)
  test_f(p) = begin
    prob = ODEProblem{true, SciMLBase.FullSpecialize}(f_ode,convert.(eltype(p),u0),tspan,p)
    sol = solve(prob,alg,saveat=t; kwargs...)
    sum(sol.u) do x
      sum(z->(1-z)^2/2, x)
    end
  end
  diffalg(test_f, p)
end

csa = map(csan) do n
  bfun, b_u0, b_p, brusselator_jac, brusselator_comp = makebrusselator!(PROBS, n)
  @time ts = map(ADJOINT_METHODS[1:2end÷3]) do alg
    @info "Running $alg ($(SciMLSensitivity.alg_autodiff(alg) ? "AD" : "user")-jac) for N = $n"
    jac_sparsity = jac_sparsity_adtypes(b_u0, bfun, b_p)
    f = SciMLSensitivity.alg_autodiff(alg) ? ODEFunction(bfun, jac_prototype=jac_sparsity) : ODEFunction(bfun, jac=brusselator_jac, jac_prototype=jac_sparsity)
    solver = Rodas5(autodiff=false)
    @time diffeq_sen_l2(f, b_u0, tspan, b_p, bt, solver; sensalg=alg, tols...)
    t = @elapsed diffeq_sen_l2(f, b_u0, tspan, b_p, bt, solver; sensalg=alg, tols...)
    return t
  end
  @show n,ts
  ts
end

And here are the results with the sparsity and bugfix implemented:

It seems that ForwardDiff gained the most from the change, but still ForwardDiff is not even close to being faster than adjoint methods (for N<=9), as it is the case in my benchmark.

It’s on the output repo, not the generator file SciMLBenchmarks.jl/benchmarks/AutomaticDifferentiation/BrussScaling.jmd at master · SciML/SciMLBenchmarks.jl · GitHub. I’ll see if I can move it over this next weekend.

I have created a second pull request (now on a correct repo), if that helps.

That helps thanks. Though I still need to do one last thing to the remake_zero! PR.

I have produced two additional plots that seem to explain the origin of the rest of the differences between my benchmark and SciMLBenchmark.


Plot 1: SciMLBenchmark code with added sparsity and brusselator_2d_loop function substituted with my implementation of Brusselator

Plot 2: My benchmark with the parameters (timespan, tolerances, saveat etc.) set to the same values as in SciMLBenchmark. Entries with ADJ are using SciMLSensitivity.adjoint_sensitivities instead of DifferentiationInterface.gradient to get the sensitivities.

To summarize:

  1. ForwardDiff in my benchmark was scaling better due to provided sparse jac_prototype and ForwardDiff was the algorithm that gained the most performance from this change.
  2. In my benchmark ForwardDiff was faster for N<10 because:
    2.1 My implementation of brusselator was slower for all algorithms except ForwardDiff (the speed remained the same) and Interpolating CASA AD-v^T seeding (was a bit faster in comparison to SciMLBenchmark implementation).
    2.2 My choice of parameters (timespan, tolerances, etc.) was promoting ForwardDiff.

We should add a ForwardDiff (Sparse) to the SciMLBenchmark version. Definitely an interesting observation of something that has changed in the ecosystem since 2019.

Yeah this can be hard to account for in a simple plot. That’s why I generally say the cutoff is somewhere between like 10 to 100, because it’s problem-dependent. The scaling difference is clear it’s just unclear where you should switch, and this is really just one of many examples to try to justify the switch point.

I don’t understand how providing the Jacobian prototype can change benchmark results if the sparsity detection step is not measured anyway?

Below is the link to the ODEFunction, which takes as parameter jac_prototype

To my understanding providing sparsity pattern via jac_prototype accelerates computation as not all entries of the jacobian need to be computed and all operations on the jacobian operate on a sparse matrix, shaving of computation.

Sparsity detection step is just to automatically determine the sparsity pattern of the jacobian and it could be done ahead of time, since it is done only once (in particular when pattern is known it can be provided manually).

Therefore the pattern detection itself is not a part of the benchmark. But all the operations done by sensitivity analysis and ODE solver are. If jac_prototype is provided, some of those operations will be done on sparse matrices instead of dense ones, therefore speeding up the sensitivity analysis.