CVODE_BDF using order of magnitude fewer function evals than Julia solvers

Recently ported a Matlab implementation over to Julia. My RHS function evaluates faster in Julia, which is good, but the vast majority of Julia solvers seem to require far more function evaluations. CVODE_BDF outperforms ode15s, but QNDF and others all require an order of magnitude more function evals, which I found surprising?

My problem is of the form

A inv(B) C dX = K

where matrices A, B, C and RHS K are all functions of state X. I solve this using

dX = (A * (B \ C)) \ K

which is passed to the solver. It’s quite a costly RHS function.

Function evals:
CVODE_BDF: 128
Matlab ode15s: 408
ddebdf (fortran solver in julia): 1383
QNDF: 5604
RadauIIA5 (best native Julia solver): 2547

I have also tried with some other solvers:

image

My question might be a bit too vague, I guess I was just surprised that the older solvers that are not as feature rich (CVODE_BDF) seem to be more efficienct for my system than any of the newer solvers? Should I not worry about it and just go ahead and use CVODE_BDF? Or is there something else going on, and maybe I should place more trust into the Julia solvers despite them being less efficienct?

I was doubly surprised since QNDF is supposed to be quite similar to ode15s, but they seem to be handling my system very differently given the number of function evals. (I have checked matlab and julia functions agree, and solvers converge when using small tolerances).

How many jacobians are used? For big problems, the jacobians often matter at least as much as the function calls. Also, have you checked that the error being achieved is the same? I.e. when you say reltol=1e-5 the accuracy won’t be the same for all the solvers.

Comparing CVODE_BDF and QNDF with abstol=1e-6, reltol=1e-3 to QNDF with abstol=1e-12, reltol=1e-8, they both appear to be similar in accuracy. Maximum difference from low tolerance solution both around 1e-5. (Feel free to correct me if this is not a good way of measuring accuracy)

Edit: and just checking with matlab ode15s, also order 1e-5 but slightly bigger error at 1.45e-5, as opposed to 1.06e-5 for QNDF and 1.27e-5 for CVODE_BDF.

image

(percentage errors around 5 and 6 %)

CVODE_BDF creates 3 Jacobians, QNDF creates 80.

Sorry, I feel like my problem is likely too general to really tackle, as I understand that specific solver performance is system dependent and I haven’t really given much information about my system. It’s not something easily shareable because matrices A, B, C are non-trivially calculated from the state vector X, rather than in any nice analytical form…

Solver stats if interesting:

CVODE_BDF()

Number of function 1 evaluations: 128
Number of function 2 evaluations: 0
Number of W matrix evaluations: 17
Number of linear solves: 0
Number of Jacobians created: 3
Number of nonlinear solver iterations: 125
Number of nonlinear solver convergence failures: 0
Number of rootfind condition calls: 0
Number of accepted steps: 106
Number of rejected steps: 0

QNDF()

Number of function 1 evaluations: 5604
Number of function 2 evaluations: 0
Number of W matrix evaluations: 80
Number of linear solves: 162
Number of Jacobians created: 80
Number of nonlinear solver iterations: 162
Number of nonlinear solver convergence failures: 0
Number of rootfind condition calls: 0
Number of accepted steps: 79
Number of rejected steps: 1

Can you use a work-precision diagram on the time series error? There are still some differences in the methods, such as how CVODE uses an I-controller while FBDF uses a PI-controller. This means that with the same tolerance we generally see FBDF about an order of magnitude more accurate, with that being very dependent on the ODE solver. So to turn any statement of performance it’s best to also take into account accuracy simultaneously, hence the work-precision diagrams in SciMLBenchmarks.jl: Benchmarks for Scientific Machine Learning (SciML) and Equation Solvers · The SciML Benchmarks.

There are all sorts of effects required to understand such a statement. For example, see POLLU Work-Precision Diagrams · The SciML Benchmarks

This is the error measured at the end point:

image

while the error measured over the whole time series:

image

Notice any differences? While in many cases the error measurement doesn’t make too much of a difference, in some cases we have noticed some of the older BDF methods (specifically lsoda and CVODE) can get quite different results given the implementation of the interpolation and other factors combined.

So without a WPD on the time series the answer is kind of :person_shrugging:.

It would be nice to get our hands on this case and see what is causing the difference though. And see if swapping the time stepping choice to a PController makes a noticable difference here. If you could please share the ODE we can play with it and see what’s causing the difference.
\

QNDF is more like ode15s, and FBDF is more like CVODE_BDF. You should probably compare FBDF and CVODE in these cases as though two should in theory be rather close.

Thanks for the reply Chris, I used a test_sol from FBDF with abstol and reltol set to 10^-14, and followed the code given in POLLU Work-Precision Diagrams · The SciML Benchmarks. I wasn’t exactly sure how to specify end-point vs over time, but here is what I got:

WorkPrecisionSet_2

So it seems like CVODE_BDF is performing quite well here over all tolerances, and better than FBDF.

Running FBDF with solve option “controller = OrdinaryDiffEq.PredictiveController()” doesn’t seem to change anything (destats are identical), so I’m not sure if maybe I’m not specifying that option correctly?
I’m running the following to change the controller:

sol1 = solve(prob, FBDF(autodiff=false),abstol=1e-6, reltol=1e-3)
sol2 = solve(prob, FBDF(autodiff=false),controller = OrdinaryDiffEq.PredictiveController(),abstol=1e-6, reltol=1e-3)

Yup looks like we’re getting stomped in this case. It would be good to add it to SciMLBenchmarks so we can track it and understand it.

2 Likes

Hey where did the code go? I think I have a fix and I think it’s related to finite differencing. First off, why was it using autodiff=false? But secondly, I am curious if it’s only different in the case where autodiff=false and diff_type === Val{:forward}.

Hi sorry, didn’t realise you were working on this and I felt a bit weird having my crappy julia code out in public!

It’s back here:
https://github.com/pfuchter/julia-cg-elastohydrodynamics-old

I am using autodiff=false partly because I didn’t write my code to be compatible with dual numbers initially, and partly I did a test a while ago where I did put in effort to get some similar code autodiff compatible and didn’t notice any speedups. Although I should probably put some time in and change my code so I can test with autodiff on

Just did some basic tests with changing diff_type and don’t seem to notice any difference between forward, central and backward.

We assume the out-of-place form is only used for StaticArrays as an optimization. It lacks optimizations like Jacobian reuses that only makes sense for larger problems. To get better performance, you should rewrite your RHS function to be in-place.

1 Like

Hi, thank you for your reply, this was very helpful and seems to be the main issue.

Simply creating a mock in-place function: fun = (du,X,p,t) -> du .= dynamics_combined(X, p, t) takes FBDF(autodiff=false) from 12 s to 0.7 s runtime, and 7000 to 413 function evals.

I might do another work-precision diagram just to see how well Julia algs do against CVODE using this interesting fix, but for now just some simple timings for a test case:

FBDF: 12.089 s
FBDF with mock in-place function: 735.099 ms
CVODE_BDF: 634.817 ms

Edit: and for a slightly more complex case, FBDF is now outperforming CVODE, so crisis averted, faith restored and thank you everyone for your help.

1 Like

The original motivation was because of Zygote in solver support would require it. But Zygote is now so far off that it doesn’t really matter. We should consider doing Jacobian reuse on the out of place solvers.