Next_step! resulting in slightly different results for the same inputs

Bart_van_de_Lint · September 21, 2024, 8:27pm

I am using the following code to initialize an MTK model:

dt = 0.05
tspan   = (0.0, dt) 
solver = CVODE_BDF()

integrator = OrdinaryDiffEqCore.init(prob, solver; dt, abstol=0.0006, reltol=0.001, save_on=false, dtmin=1e-7)
OrdinaryDiffEqCore.step!(integrator, 2.0, true) # step 2 seconds

The resulting state is slightly different every time I run it, even after restarting julia.

pos[end] = [2.2989038153247816 -4.685813350571358e-9 54.00271363884023]

After restarting:

pos[end] = [2.3003361074197852 1.0097878204576373e-8 54.002974742010224]

Is this just floating point errors, or is there something else I can do to solve this?

ufechner7 · September 21, 2024, 11:03pm

Your results are within the specified tolerances. And why are you using Sundials in the first place? In most cases the pure Julia solvers give much better results.

Bart_van_de_Lint · September 22, 2024, 9:03am

Sundials is faster. And yes, the results are within the tolerances, but where is the randomness coming from? Does the solver cause randomness? I would expect the same results every time, as nothing in the code changes and I am not using any random numbers.

ufechner7 · September 22, 2024, 10:18am

Ask the sundials developers. Here is a good place to ask about native Julia solvers, but not about third-party solvers.

Sundails mailing list: SUNDIALS: Mailing List | Computing

ufechner7 · September 22, 2024, 10:21am

I doubt that. I rather think you are using different parameters.

DanielVandH · September 22, 2024, 11:13am

Does it happen for other solvers? I remember a similar thread somewhere about the similar issue but can’t find it now…

Bart_van_de_Lint · September 22, 2024, 11:23am

Which parameters should I change? I tried using some different linear solvers, but it did not change the speed much: Specifying (Non)Linear Solvers and Preconditioners · DifferentialEquations.jl
CVODE_BDF is 2 times faster than KenCarp4 in my case, and KenCarp4 is the fastest solver of all the julia solvers I tried.

Bart_van_de_Lint · September 22, 2024, 11:30am

No, the julia solvers don’t have this problem, not as much at least. There is a slight difference in the resulting states in the range of 1e-9 when using KenCarp4, but that should be floating point drift.

ufechner7 · September 22, 2024, 7:44pm

I used the example Tethers.jl/src/Tether_07.jl at main · ufechner7/Tethers.jl · GitHub and added using Sundials.

Getting the following results for tol = 1e-6:

CVODE_BDF()
Elapsed time: 0.001612388 s, speed: 6202.0 times real-time
Number of evaluations per step: 1.7
QNDF(autodiff=true)
Elapsed time: 0.00189734 s, speed: 5271.0 times real-time
Number of evaluations per step: 2.2

@Bart_van_de_Lint So you are right, until now CVODE_BDF() is the fastest solver, about 18% faster than the fastest Julia solver I found so far.

But less accurate… I mean, there is always a trade-off, fast solution or robust and accurate solution. Your choice.

By the way, try:
?QNDF() and
?CVODE_BDF()

Both of them have a lot of options.

@ChrisRackauckas Why don’t we have a Julia solver yet that is faster than CVODE_BDF() ?

Related: CVODE_BDF outperforms Julia solvers for stiff system biology model?

Bart_van_de_Lint · September 23, 2024, 5:53am

FBDF gets me a bit slower, but more precise results than CVODE_BDF. Worth the tradeoff I think. I will check the options to see if it can get any faster. SciMLBenchmarks.jl: Benchmarks for Scientific Machine Learning (SciML) and Equation Solvers · The SciML Benchmarks is really useful to find the fastest Julia solver.

ChrisRackauckas · September 23, 2024, 11:59am

Some First Things

First things first. Most of the data I’m about to mention comes from

https://docs.sciml.ai/SciMLBenchmarksOutput/stable/

and we’re about to do another tag on it, so current dev is much more updated. This is because, as mentioned in CVODE_BDF outperforms Julia solvers for stiff system biology model? - #5 by ChrisRackauckas, we had a pretty major improvement to the nonlinear solver heuristics which improved a good chunk.

And with this, when I talk performance it’s work-precision diagrams so that accuracy is aligned: we know that CVODE tends to give errors much higher than its tolerance and so that always needs to be accounted for.

Back to the main show: Where we do well

So okay with that out of the way, performance. That depends on many factors. First of all, for smaller systems like 20 or less the Rosenbrock methods are clearly outperform CVODE so use that. In the 10-200 range or so, a well-tuned parallel implicit extrapolation usually does best so use that. For DAEs, the mass matrix form almost always outperforms IDA so use FBDF in mass matrix form or a Rosenbrock method.

Remaining cases

What that means is that there are two regimes where CVODE is identified as still doing well: high f call cost and “sufficiently large”.

When I say sufficiently large, I mean “ODEs larger than 200”, but there’s a bit of a catch on this. Until sometime this year it was the case that we did have some hiccups with the dense linear solvers, and this was due to OpenBLAS defaults. Long story short is that Julia v1.12 improves BLAS threads defaults and then we also have some nice improvements to the default BLAS chooser by incorporating MKL into the mix. To show this, here’s a slide from a workshop I recently gave:

You can see that the standard BLAS LUFactorization() is pretty bad in almost all cases , but how bad depends on the CPU you are using, and GenericLUFactorization() could beat it. Our Sundials wrapper always defaults to the equivalent of GenericLUFactorization(), you can set LapackDense but that’s not the default, which then ironically means it’s better in the cases where BLAS defaults are just bad (such as right after the threading threshold). The changes we did improved the heuristics for the choices. I think for all CPUs we now at least always do better than the naive choice, though you can see for example on AMD CPUs there is a spot at like N=250 where the multithreading threshold kicks in too early, but it still beats OpenBLAS and no BLAS pretty handedly so that’s fine. With those improvements, the “large enough and dense” region should now be handled well on our end.

But the “sufficiently large” with sparse matrix is the one that still has something missing. You can see it here on Bruss, before the nonlinear solver change:

After the nonlinear solver change:

The difference isn’t all that large and it’s only when the tolerances are sufficiently small, but it does exist. My guess is there’s an issue with the symbolic factorization reuse since that’s almost precisely the cost there, so I may try to track that down sometime in the near future.

For the high f call cost, that was something only recently isolated in Create astrochem benchmark by ChrisRackauckas · Pull Request #1046 · SciML/SciMLBenchmarks.jl · GitHub so we’ll get those merged and up soon. Since these two benchmarks are only in the CI right now, let me share the results:

The first one isn’t bad for the latter one is curiously bad for FBDF. Since CVODE and FBDF are very similar algorithms, that shouldn’t happen. This seems to be something arising in chemical reaction networks with very high f call costs, where this arises because of ^ in hill functions turns into the dominating factor. You can see this in the profile here for the last one:

where >90% of the time is in ^ calls in the right hand side. In that case, the benchmark is “abnormal” from an implicit ODE solver context as it’s normally assumed that it’s somewhat LU-bound, but that’s clearly not the case here. I suspect this is the case with @sebapersson’s examples too and I would like help turning them into reproducible examples on SciMLBenchmarks if anyone has a second. That would be the most useful thing here.

What’s next?

So then back to:

There’s another way to interpret this question as well, which is, why have we not had a focus on getting a solver that covers the CVODE cases? Because indeed, we haven’t really cared all that much to handle those cases for a good reason: CVODE is a really good method, we have really good (I think you can argue the best) open source wrappers of Sundials in a high-level language, and we have no plans to stop maintaining the wrappers. Therefore, why would we care to build more methods in the domain where CVODE is good? If CVODE is good, then call CVODE, and we can go work on other things. That’s especially true for DFBDF vs IDA, there just isn’t a strong incentive to work on it.

That said, years of poking has clearly got us pretty close. I’m sure we’ll find something with the high f cost investigations over the next few months, and @Oscar_Smith is already doing some major changes to the linear solver wrappers which could close the remaining gap in the large end there. But in general, it’s not a high priority to work on “things that have already been done” when we can just have a good wrapper.

Instead, there’s many other things to do, so the coming soon is:

Improved nonlinear solvers (i.e. replacing the internal nonlinear solvers of OrdinaryDiffEq with use of NonlinearSolve.jl. This is partially completed)
Some symbolic-numeric stuff… which will give massive performance improvements. More to come on this in the near future.
New parallel methods
Improved compilation and load times

which all take precedence. That said, the poke is getting close and I’m sure we’ll poke it closer over the next year as well.

sebapersson · September 23, 2024, 12:16pm

In that case, the benchmark is “abnormal” from an implicit ODE solver context as it’s normally assumed that it’s somewhat LU-bound, but that’s clearly not the case here. I suspect this is the case with @sebapersson’s examples too and I would like help turning them into reproducible examples on SciMLBenchmarks if anyone has a second.

I can add here that the example in my use-case are indeed high f call cost. As for turning into reproducible examples for SciMLBenchmarks, it is on my todo and as soon as PEtab.jl (which I use to create the problems) is updated to use MTK v9 I can setup the benchmarks.

Oscar_Smith · September 23, 2024, 12:19pm

If you’re f call bound, does it make sense to try RadauIIA5? that should decrease the number of f calls in exchange for more linear algebra

ChrisRackauckas · September 23, 2024, 12:51pm

Still f-bound. I want to try RadauIIA13 (which is why RadauIIA9 is in the benchmarks). Higher order gets even more linear algebra for the cost. My guess is that AdaptiveRadau when done might be the best thing here, so I have a few benchmarks in mind for Shreyas to develop against.

This is also why I pinged you two weeks ago about alternative ^ implementations. I think it would be fairly helpful to allow for Symbolics codegen to have a fastmath kind of version specifically because using DiffEqBase.fastpow would be a good thing for most of these use cases. The astro chem benchmark is 2x faster when using DiffEqBase.fastpow and converges to an accuracy of 1e-7 IIRC, so having an easy swap for these is probably good for “most” use cases.

Topic		Replies	Views
Reports of Sundials' Death are Greatly Exaggerated? Modelling & Simulations diffeq	6	851	February 28, 2023
CVODE_BDF using order of magnitude fewer function evals than Julia solvers Modelling & Simulations	10	441	May 31, 2023
JuliaDiffEq Vern9() solver taking too long General Usage question	18	1987	August 27, 2019
CVODE_BDF outperforms Julia solvers for stiff system biology model? Modelling & Simulations question	5	338	May 8, 2024
ODE solvers - why is Matlab ode45 uncannily stable? Numerics ode	33	6388	June 19, 2021

Next_step! resulting in slightly different results for the same inputs

Some First Things

Back to the main show: Where we do well

Remaining cases

What’s next?

Related topics