Reports of Sundials' Death are Greatly Exaggerated?

Apologies for the tongue-in-cheek title.

According to this discussion,

native QNDF julia solvers now consistently outperform CVODE_BDF on the test suite. However, my experience solving reasonably stiff sets of ~100’s of ODEs that arise from method of lines problems is that CVODE_BDF is typically faster than native Julia solvers, and often more stable. To give some examples, here’s a simple method of lines solution of the single diffusion-reaction equation:

\frac{\partial c}{\partial t} = D \frac{\partial^2 c}{\partial x^2} - kc

with fixed Dirichlet boundary conditions at both ends.

using DifferentialEquations
using LinearAlgebra
using Sundials
using BenchmarkTools

# Code Method of Lines Solver for Film Theory Model
# This code is modified from code published by Moore et al. in the SI of 10.1039/D2EE03237F.

#Inputs:
# - n, number of species, integer
# - R, function for reaction rate. Function input is local concentrations (mol/m3). Function output is rate of change of local concentrations (mol/m3.s). Input and ouput are both vectors of length n.
# - D, vector of diffusivities (m2/s). Vector of length n
# - c0, Initial concentrations of all species, mol/m3.
# - csurf, surface concentration of species 1 (dissolving gas, most often CO2), mol/m3
# - cbulk, concentrations of all species in bulk. Must satisfy R(cbulk) = 0, otherwise an error will be returned.
# - N, number of cells to divide domain into (non-uniform mesh is used)
# - L, length of the domain. Typically about 100 μm.
# - tfinal, the endpoint of the time integration. This should be infinite, but in practice a very long time (e.g. 1e5 seconds) is sufficient.
# - OM, number of orders of magnitude of a over which to grow cells

#Outputs:
# - Solution object to diffusion-reaction problem at all times.

function AbsorptionFilmTheorySolver(n, R, D::Array{Float64}, c0, csurf, cbulk; N = 50, L = 300e-6, 
        tfinal = 100.0, OM = 2.0, solver = CVODE_BDF())
    
    #Check that the bulk conditions are at equilibrium
    if norm(R(cbulk)) > 1e-6
        error("Bulk conditions are not at equilibrium.")
    end
    
    #Preliminary Calculations
    #Boundary Points of Cells
    b = 10 .^(LinRange(0.0, OM, N+1))
    b = b .- b[1]
    b = b .* (L/b[end])

    #Cell Center Points & Cell Widths
    x = (b[2:N+1] .+ b[1:N]) ./ 2
    Δb = b[2:N+1] .- b[1:N]
    Δx = x[2:N] .- x[1:N-1]

    #Set Initial Conditions (average values within cells)
    c0_array = zeros(N,n)
    for i = 1:N
        for j = 1:n
            c0_array[i, j] = c0[j]
        end
    end

    #ODE Function
    function odefun!(dcdt, c, (JLeft, JRight, n, N, csurf, cbulk, D, R, Δb, Δx), t)

        #Calculate dcdt for cell adjacent to gas-liquid interface
        #Start with CO2 at gas-liquid interface
        JLeft = -D[1]*(c[1,1] - csurf)/(Δb[1]/2)
        JRight = -D[1]*(c[2, 1] - c[1, 1])/Δx[1]
        dcdt[1, 1] = ((JLeft) - (JRight))/Δb[1] + R(@view c[1, :])[1]

        #Then do the same for all other species at the gas-liquid interface
        if n > 1
            for j = 2:n
                JLeft = 0.0
                JRight = -D[j]*(c[2, j] - c[1, j])/Δx[1]
                dcdt[1, j] = ((JLeft) - (JRight))/Δb[1] + R(@view c[1, :])[j]
            end
        end

        #Calculate dcdt in cell adjacent to end of domain
        for j = 1:n
            JLeft = -D[j]*(c[N, j] - c[N-1, j])/Δx[end]
            JRight = -D[1]*(cbulk[j] - c[N, j])/(Δb[N]/2)
            dcdt[N, j] = ((JLeft) - (JRight))/Δb[end] + R(@view c[N, :])[j]
        end

        #Calculate rate of change of concentration in interior cells
        for i = 2:N-1
            for j = 1:n
                JLeft = -D[j]*(c[i, j] - c[i-1, j])/Δx[i-1]
                JRight = -D[j]*(c[i+1, j] - c[i, j])/Δx[i]
                dcdt[i, j] = ((JLeft) - (JRight))/Δb[i] + R(@view c[i, :])[j]
            end
        end

    end

    #Solve ODE
    tspan = (0.0, tfinal)
    prob = ODEProblem(odefun!, c0_array, tspan, (0.0, 0.0, n, N, csurf, cbulk, D, R, Δb, Δx))
    sol = solve(prob, solver)

    return (x, sol)
end

n, D, c0, csurf, cbulk, L, N, tfinal = (1, [1e-9], [0.0], 10.0, [0.0], 100e-6, 50, 3.0)
R(c) = [-c[1]]

@btime x, sol = AbsorptionFilmTheorySolver(n, R, D, c0, csurf, cbulk; N = N, L = L, 
        tfinal = 100.0, OM = 2.0, solver = CVODE_BDF());
@btime x, sol = AbsorptionFilmTheorySolver(n, R, D, c0, csurf, cbulk; N = N, L = L, 
        tfinal = 100.0, OM = 2.0, solver = Rosenbrock23());
@btime x, sol = AbsorptionFilmTheorySolver(n, R, D, c0, csurf, cbulk; N = N, L = L, 
        tfinal = 100.0, OM = 2.0, solver = QNDF());
@btime x, sol = AbsorptionFilmTheorySolver(n, R, D, c0, csurf, cbulk; N = N, L = L, 
        tfinal = 100.0, OM = 2.0, solver = QBDF());

Here, CVODE_BDF outperfoms all alternatives, though QNDF and QBDF are reasonably close,

1.744 ms (25720 allocations: 1.74 MiB)
21.744 ms (443587 allocations: 33.78 MiB)
2.655 ms (35104 allocations: 2.47 MiB)
2.903 ms (38038 allocations: 2.66 MiB)

If we consider a more complex system (5 simultaneous coupled PDEs with much greater disparity in reaction time-scales, presumably resulting in substantially more stiffness), we run into a different problem: CVODE_BDF is fast and accurate, Rosenbrock23() is slow and accurate, but QNDF and QBDF fail to converge.

#Function to solve complete chemistry. 
#Physical data references given in SI of manuscript.
#Species 1 - 5 are CO2, OH-, HCO3-, CO32- and H+, respectively.

function FilmTheoryFullChemistry(; cCO2surf = 34, pHbulk = 10, cHCO30 = 1000, 
        N = 50, OM = 2, L = 100e-6, tfinal = 1e5, solver = CVODE_BDF())
    
    #Physical Parameters
    n = 5
    D = [1.91e-9, 5.29e-9, 1.185e-9, 0.92e-9, 9.311e-9]                       #m2/s    
    k1f = 8.42                                                                #m3/mol.s
    k1r = 1.97e-4                                                             #1/s
    k3f = 1.254e6                                                             #1/s
    k3r = 6e6                                                                 #m3/mol.s
    k5f = 2.3e7                                                               #L/mol.s
    k5r = 2.3e-1                                                              #mol/m3.s
    k2f = 3.71e-2                                                             #1/s
    k2r = k2f*(k1r/k1f)*(k5f/k5r)                                             #m3/mol.s - Calculated Exactly to Maintain Consistency
    k4f = 1.242e9                                                             #m3/mol.s
    k4r = k4f*(k3r/k3f)*(k5r/k5f)                                             #1/s - Calculated Exactly to Maintain Consistency

    #Reaction Function
    function R(c)
        return [-k1f*c[1]*c[2] + k1r*c[3] - k2f*c[1] + k2r*c[3]*c[5],
                -k1f*c[1]*c[2] + k1r*c[3]                            + k3f*c[4] - k3r*c[2]*c[3]                            - k5f*c[2]*c[5] + k5r,
                 k1f*c[1]*c[2] - k1r*c[3] + k2f*c[1] - k2r*c[3]*c[5] + k3f*c[4] - k3r*c[2]*c[3] + k4f*c[4]*c[5] - k4r*c[3],
                                                                     - k3f*c[4] + k3r*c[2]*c[3] - k4f*c[4]*c[5] + k4r*c[3],
                                            k2f*c[1] - k2r*c[3]*c[5]                            - k4f*c[4]*c[5] + k4r*c[3] - k5f*c[2]*c[5] + k5r]
    end
    
    #Calculate Bulk Concentrations
    cOHbulk = 10.0^(pHbulk - 14)*1000
    cHbulk = k5r/(k5f*cOHbulk)
    cHCO3bulk = 2*cHCO30*k3f/k3r/(cOHbulk + 2k3f/k3r)
    cCO32bulk = 2*cHCO30*cOHbulk/(cOHbulk + 2k3f/k3r)
    cCO2bulk = k1r*cHCO3bulk/(k1f*cOHbulk)
    cbulk = [cCO2bulk, cOHbulk, cHCO3bulk, cCO32bulk, cHbulk]
    c0 = cbulk
    
    #Solve System of Equations
    x, sol =  AbsorptionFilmTheorySolver(n, R, D, c0, cCO2surf, cbulk; 
        N = N, L = L, tfinal = tfinal, OM = OM, solver = solver)
    ceq = sol(tfinal)
    
    return (x, ceq, sol)
end

@btime x, ceq, sol = FilmTheoryFullChemistry(;solver = CVODE_BDF());
@btime x, ceq, sol = FilmTheoryFullChemistry(;solver = Rosenbrock23());
x, ceq, sol = FilmTheoryFullChemistry(;solver = QNDF());
x, ceq, sol = FilmTheoryFullChemistry(;solver = QBDF());

With output:

 86.481 ms (316204 allocations: 29.61 MiB)
 639.581 ms (6328167 allocations: 867.86 MiB)
┌ Warning: dt(1.4551915228366852e-11) <= dtmin(1.4551915228366852e-11) at t=5.708870444591281e-7. Aborting. There is either an error in your model specification or the true solution is unstable.
┌ Warning: dt(1.4551915228366852e-11) <= dtmin(1.4551915228366852e-11) at t=5.226666511312554e-7. Aborting. There is either an error in your model specification or the true solution is unstable.

I’m curious if there are any optimizations or tweaks to the Julia solvers I’m missing that could result in improved performance/stability vs CVODE_BDF? Or perhaps other Julia solvers are better suited to these systems (I have played around with ~20 alternatives)? Also, any insight into why QNDF is failing while CVODE_BDF is succeeding in the more complex simulation would be appreciated.

Thanks!

1 Like

What’s the reason for omitting FBDF? That’s the one that’s most equivalent to CVODE_BDF.

FBDF and TRBDF2 seem good:

julia> @btime x, ceq, sol = FilmTheoryFullChemistry(;solver = TRBDF2());
  18.708 ms (184054 allocations: 24.27 MiB)

julia> @btime x, ceq, sol = FilmTheoryFullChemistry(;solver = FBDF());
  35.114 ms (207505 allocations: 23.90 MiB)

Could probably push it even further with something like KLUFactorization for your linear solver if you provide a sparse prototype.

1 Like

First of all, we do not claim “Sundials’ death”. We still say that IDA is the best method out there for solving fully implicit DAEs. Though of course, we have plenty of benchmarks to show you almost never want to use the fully implicit DAE form because using ModelingToolkit to transform into mass matrix form is almost always faster, the docs acknowledge that IDA stands above DFBDF in terms of performance right now. We need some more benchmarks to confirm Chemical Akzo Nobel Differential-Algebraic Equation (DAE) Work-Precision Diagrams · The SciML Benchmarks at larger scales, but there’s some very clear algorithmic reasons why mass matrix forms and the ODAEProblem for ends up performing better (i.e. nonlinear tearing reducing the n in the O(n^3) operations). So no, Sundials is not “dead”, and we say in the blog post that this is the main reason it’s still getting a lot of use, though in general it’s getting a lot less use now that the symbolic-numeric tooling is involved.

And note that the claim there was not that CVODE_BDF is not a good algorithm in comparison to QNDF. The claim there was that in every single benchmark we have found at least one method in pure Julia outperforming CVODE_BDF. Smaller ODEs being Rosenbrock methods, larger ones being QNDF, TRBDF2, KenCarp47, or ROCK2 etc. That was a major turning point because before there were some areas where no algorithms were tuned, but now there was at least something everywhere.

However, in the 2 years since the blog post of course much has changed again. The main thing because that FBDF is a complete Julia implementation of a fixed leading coefficient BDF. This kind of BDF improves the stability over QNDF and is the type of BDF implementation that CVODE_BDF is. CVODE_BDF is like FBDF and VODE, and QNDF is like ode15s and LSODE. FBDF has some advantages though in terms of stability (and I chatted with Carol Woodward today about this at SIAM CSE, if you’re here), and so we do see FBDF outperform CVODE quite handedly in most of the cases where CVODE was preferred. So these days, we would normally compare CVODE with FBDF, not QNDF. though NDF methods can have some advantages over fixed leading coefficient methods if the complex eigenvalues are not too large. With FBDF around, we definitely almost never see a case where CVODE outperforms a pure Julia solver.

I say “almost never” because we did find two the other day:

https://docs.sciml.ai/SciMLBenchmarksOutput/dev/Bio/BCR/#Plot-Work-Precision-Diagram

https://docs.sciml.ai/SciMLBenchmarksOutput/dev/Bio/fceri_gamma2/

Those plots are getting fixed to look nicer really soon (improve Bio benchmarks by TorkelE · Pull Request #581 · SciML/SciMLBenchmarks.jl · GitHub), this is hot off the press so you’re seeing this being formalized in real-time. But notice two things, one is that that is a tiny difference: in FCERI it’s a 2% difference and in BCR it’s about 5%-10%. And it only occurs in specific cases which seem to have large complex eigenvalues and with preconditioned GMRES.

In summary, the case that we have found presents a <10% improvement by CVODE, though its very specific. It’s only for a few specific oscillatory chemical reaction examples, and in all of those cases CVODE is slower with dense Jacobians, CVODE is slower with sparse Jacobians, and CVODE is slower with the default handling of GMRES. CVODE only sees the 10% better case when both are using the same preconditioner with GMRES (in those test cases, iLU). What this is pointing to is that we could probably improve the performance of something in our preconditioner handling code, so since we just found this very recently it hasn’t been fixed but we are looking into it.

P.S. Your benchmark is not valid since it’s not comparing work-precision. @btime with everything at the same adaptive tolerances does not guarantee that you get solutions at the same accuracy. In fact, if any repeated statement about accuracy can be gleamed from the benchmarks, it’s that the adaptivity choices of CVODE make it consistently have a higher solution error when using the same tolerances as other methods. You can see this in the benchmarks as its first dot is to the right of many other methods. If one method is computing at a lower accuracy but is faster, then that’s not a clear statement. The tolerances need to be adjusted so that the solutions of the two methods are computed to the same accuracy. This is why work-precision diagrams are used instead of @btime.

8 Likes

Thank you so much for this! The title was meant to be tongue-in-cheek; I hope it came across that way.

On my problem, FBDF() is comparable to CVODE_BDF(), while TRBDF2() is ~ twice as fast as CVODE_BDF, at least according to @btime, which as you say is probably not an apples-to-apples comparison. I’m planning ~10^5 solves, so this acceleration will be very helpful! CVODE_BDF() has been my default algorithm for MOL applications, precisely because in the past I haven’t found algorithms in DiffEq that were faster. I will make sure to keep these BDF codes in mind going forward. Thanks again!

1 Like

in general I think our docs should probably center fbdf a little more. it’s one of the better all round methods.

Yes it should. It’s just new so we’ve been giving it a year or so to really be battle tested before recommending it too much. But it has gotten that battle testing and by now it’s clear that it is a generally good method, with the limitations of BDFs in mind.