Next_step! resulting in slightly different results for the same inputs

I am using the following code to initialize an MTK model:

dt = 0.05
tspan   = (0.0, dt) 
solver = CVODE_BDF()

integrator = OrdinaryDiffEqCore.init(prob, solver; dt, abstol=0.0006, reltol=0.001, save_on=false, dtmin=1e-7)
OrdinaryDiffEqCore.step!(integrator, 2.0, true) # step 2 seconds

The resulting state is slightly different every time I run it, even after restarting julia.

pos[end] = [2.2989038153247816 -4.685813350571358e-9 54.00271363884023]

After restarting:

pos[end] = [2.3003361074197852 1.0097878204576373e-8 54.002974742010224]

Is this just floating point errors, or is there something else I can do to solve this?

Your results are within the specified tolerances. And why are you using Sundials in the first place? In most cases the pure Julia solvers give much better results.

2 Likes

Sundials is faster. And yes, the results are within the tolerances, but where is the randomness coming from? Does the solver cause randomness? I would expect the same results every time, as nothing in the code changes and I am not using any random numbers.

Ask the sundials developers. Here is a good place to ask about native Julia solvers, but not about third-party solvers.

Sundails mailing list: SUNDIALS: Mailing List | Computing

1 Like

I doubt that. I rather think you are using different parameters.

Does it happen for other solvers? I remember a similar thread somewhere about the similar issue but can’t find it now…

Which parameters should I change? I tried using some different linear solvers, but it did not change the speed much: Specifying (Non)Linear Solvers and Preconditioners · DifferentialEquations.jl
CVODE_BDF is 2 times faster than KenCarp4 in my case, and KenCarp4 is the fastest solver of all the julia solvers I tried.

No, the julia solvers don’t have this problem, not as much at least. There is a slight difference in the resulting states in the range of 1e-9 when using KenCarp4, but that should be floating point drift.

I used the example Tethers.jl/src/Tether_07.jl at main · ufechner7/Tethers.jl · GitHub and added using Sundials.

Getting the following results for tol = 1e-6:

  • CVODE_BDF()
    Elapsed time: 0.001612388 s, speed: 6202.0 times real-time
    Number of evaluations per step: 1.7
  • QNDF(autodiff=true)
    Elapsed time: 0.00189734 s, speed: 5271.0 times real-time
    Number of evaluations per step: 2.2

@Bart_van_de_Lint So you are right, until now CVODE_BDF() is the fastest solver, about 18% faster than the fastest Julia solver I found so far.

But less accurate… I mean, there is always a trade-off, fast solution or robust and accurate solution. Your choice.

By the way, try:
?QNDF() and
?CVODE_BDF()

Both of them have a lot of options.

@ChrisRackauckas Why don’t we have a Julia solver yet that is faster than CVODE_BDF() ?

Related: CVODE_BDF outperforms Julia solvers for stiff system biology model?

1 Like

FBDF gets me a bit slower, but more precise results than CVODE_BDF. Worth the tradeoff I think. I will check the options to see if it can get any faster. SciMLBenchmarks.jl: Benchmarks for Scientific Machine Learning (SciML) and Equation Solvers · The SciML Benchmarks is really useful to find the fastest Julia solver.

Some First Things

First things first. Most of the data I’m about to mention comes from

https://docs.sciml.ai/SciMLBenchmarksOutput/stable/

and we’re about to do another tag on it, so current dev is much more updated. This is because, as mentioned in CVODE_BDF outperforms Julia solvers for stiff system biology model? - #5 by ChrisRackauckas, we had a pretty major improvement to the nonlinear solver heuristics which improved a good chunk.

And with this, when I talk performance it’s work-precision diagrams so that accuracy is aligned: we know that CVODE tends to give errors much higher than its tolerance and so that always needs to be accounted for.

Back to the main show: Where we do well

So okay with that out of the way, performance. That depends on many factors. First of all, for smaller systems like 20 or less the Rosenbrock methods are clearly outperform CVODE so use that. In the 10-200 range or so, a well-tuned parallel implicit extrapolation usually does best so use that. For DAEs, the mass matrix form almost always outperforms IDA so use FBDF in mass matrix form or a Rosenbrock method.

Remaining cases

What that means is that there are two regimes where CVODE is identified as still doing well: high f call cost and “sufficiently large”.

When I say sufficiently large, I mean “ODEs larger than 200”, but there’s a bit of a catch on this. Until sometime this year it was the case that we did have some hiccups with the dense linear solvers, and this was due to OpenBLAS defaults. Long story short is that Julia v1.12 improves BLAS threads defaults and then we also have some nice improvements to the default BLAS chooser by incorporating MKL into the mix. To show this, here’s a slide from a workshop I recently gave:

You can see that the standard BLAS LUFactorization() is pretty bad in almost all cases , but how bad depends on the CPU you are using, and GenericLUFactorization() could beat it. Our Sundials wrapper always defaults to the equivalent of GenericLUFactorization(), you can set LapackDense but that’s not the default, which then ironically means it’s better in the cases where BLAS defaults are just bad (such as right after the threading threshold). The changes we did improved the heuristics for the choices. I think for all CPUs we now at least always do better than the naive choice, though you can see for example on AMD CPUs there is a spot at like N=250 where the multithreading threshold kicks in too early, but it still beats OpenBLAS and no BLAS pretty handedly so that’s fine. With those improvements, the “large enough and dense” region should now be handled well on our end.

But the “sufficiently large” with sparse matrix is the one that still has something missing. You can see it here on Bruss, before the nonlinear solver change:

image

After the nonlinear solver change:

image

The difference isn’t all that large and it’s only when the tolerances are sufficiently small, but it does exist. My guess is there’s an issue with the symbolic factorization reuse since that’s almost precisely the cost there, so I may try to track that down sometime in the near future.

For the high f call cost, that was something only recently isolated in Create astrochem benchmark by ChrisRackauckas · Pull Request #1046 · SciML/SciMLBenchmarks.jl · GitHub so we’ll get those merged and up soon. Since these two benchmarks are only in the CI right now, let me share the results:

image

image

The first one isn’t bad for the latter one is curiously bad for FBDF. Since CVODE and FBDF are very similar algorithms, that shouldn’t happen. This seems to be something arising in chemical reaction networks with very high f call costs, where this arises because of ^ in hill functions turns into the dominating factor. You can see this in the profile here for the last one:

where >90% of the time is in ^ calls in the right hand side. In that case, the benchmark is “abnormal” from an implicit ODE solver context as it’s normally assumed that it’s somewhat LU-bound, but that’s clearly not the case here. I suspect this is the case with @sebapersson’s examples too and I would like help turning them into reproducible examples on SciMLBenchmarks if anyone has a second. That would be the most useful thing here.

What’s next?

So then back to:

There’s another way to interpret this question as well, which is, why have we not had a focus on getting a solver that covers the CVODE cases? Because indeed, we haven’t really cared all that much to handle those cases for a good reason: CVODE is a really good method, we have really good (I think you can argue the best) open source wrappers of Sundials in a high-level language, and we have no plans to stop maintaining the wrappers. Therefore, why would we care to build more methods in the domain where CVODE is good? If CVODE is good, then call CVODE, and we can go work on other things. That’s especially true for DFBDF vs IDA, there just isn’t a strong incentive to work on it.

That said, years of poking has clearly got us pretty close. I’m sure we’ll find something with the high f cost investigations over the next few months, and @Oscar_Smith is already doing some major changes to the linear solver wrappers which could close the remaining gap in the large end there. But in general, it’s not a high priority to work on “things that have already been done” when we can just have a good wrapper.

Instead, there’s many other things to do, so the coming soon is:

  • Improved nonlinear solvers (i.e. replacing the internal nonlinear solvers of OrdinaryDiffEq with use of NonlinearSolve.jl. This is partially completed)
  • Some symbolic-numeric stuff… which will give massive performance improvements. More to come on this in the near future.
  • New parallel methods
  • Improved compilation and load times

which all take precedence. That said, the poke is getting close and I’m sure we’ll poke it closer over the next year as well.

4 Likes

In that case, the benchmark is “abnormal” from an implicit ODE solver context as it’s normally assumed that it’s somewhat LU-bound, but that’s clearly not the case here. I suspect this is the case with @sebapersson’s examples too and I would like help turning them into reproducible examples on SciMLBenchmarks if anyone has a second.

I can add here that the example in my use-case are indeed high f call cost. As for turning into reproducible examples for SciMLBenchmarks, it is on my todo and as soon as PEtab.jl (which I use to create the problems) is updated to use MTK v9 I can setup the benchmarks.

If you’re f call bound, does it make sense to try RadauIIA5? that should decrease the number of f calls in exchange for more linear algebra

Still f-bound. I want to try RadauIIA13 (which is why RadauIIA9 is in the benchmarks). Higher order gets even more linear algebra for the cost. My guess is that AdaptiveRadau when done might be the best thing here, so I have a few benchmarks in mind for Shreyas to develop against.

This is also why I pinged you two weeks ago about alternative ^ implementations. I think it would be fairly helpful to allow for Symbolics codegen to have a fastmath kind of version specifically because using DiffEqBase.fastpow would be a good thing for most of these use cases. The astro chem benchmark is 2x faster when using DiffEqBase.fastpow and converges to an accuracy of 1e-7 IIRC, so having an easy swap for these is probably good for “most” use cases.