Comparing the Diffeq.jl Euler stepper with a by-hand Euler stepper

zornsllama · August 19, 2021, 8:34pm

I am trying to understand the behavior of Diffeq.jl in a toy example, with the eventual goal of implementing networks of integrate-and-fire neurons. These questions are probably very basic, but any clarification would be much appreciated! The vector field of interest (parameters are pretty much random) is

function vf!(du, u,p,t)
    du .= (2 .- u) ./ 3.0 .+ 0.2
end

I am comparing the behavior of two Euler stepping functions, one written by hand and one using the Diffeq.jl integrator interface.

function mystep!(du,u)
    vf!(du,u,nothing,nothing)
    u .= u .+ 0.01*du
end

versus

u0 = rand(10000)
prob = ODEProblem(vf!, u0, (0.0,1000.0))
integ = init(prob, Euler(),dt=.01,save_everystep=false)
step!(integ)

When I run @benchmark mystep!(du0,u0) and @benchmark step!(integ), I get mean results 21.107 μs ± 42.136 μs and 27.882 μs ± 2.594 μs respectively. The increased variability in the first case seems to be attributable to one step in every 500 or so taking on the order of ~1 ms. This can’t be chalked up to any function compilation, as far as I understand.

Question 1: why does the by-hand step function occasionally take much longer, and how does the integrator in Diffeq.jl avoid this?

Question 2: what exactly is the additional overhead introduced by the integrator interface such that each step! takes longer on average than each mystep!?

Oscar_Smith · August 19, 2021, 8:44pm

0.01*du is allocating a vector, so garbage collection gets called occasionally. You wanted 0.01.*du

zornsllama · August 19, 2021, 8:48pm

Ah, thanks! That makes sense. However, this seems to result in the difference between step! and mystep! becoming even more dramatic (30 microsecs per step for step! vs 10 microsecs for mystep!). So my remaining question is: what is Diffeq.jl doing with those extra 20 microsecs?

ChrisRackauckas · August 19, 2021, 10:03pm

Yeah, this is a fun one. If you want to know, just take a look at the profile:

using Profile
@profile for i in 1:100000 step!(integ) end
using ProfileView
ProfileView.view()

In the profile you will notice that OrdinaryDiffEq.Euler has 5 O(n) operations:

@muladd @.. u = uprev + dt*integrator.fsalfirst
f(fsallast,u,p,t+dt)
any(isnan,u)
recursivecopy!(uprev,integrator.u)
recursivecopy!(fsalfirst,fsallast)

That’s simplified a bit, but you catch the drift. So first of all, this computation is so cheap that any(isnan,u) is the most expensive thing in the list . You can always turn that off:

integ = init(prob, Euler(),dt=.01,save_everystep=false,unstable_check = (du,u,p,t)->false)

but you can guess why a production ODE solver has defaults for exiting on NaN. But there’s 2 other extra O(n) kernel calls, why? This is explained in a very old video:

Basically what’s going on is that if you do this inverted form, update then calculate f, then you can have the derivative at both sides which defines a smooth interpolation over the solve. Note that the video explains how for most RK methods that are used there is a separate trick that actually makes it free. It’s really just Euler (and a few other low order less used methods), that require this trick, which then results in 2 extra copyto! operations.

So yes, if your f is only some trivial O(n) calculation, you can handwrite an Euler method to be 2n operations per step, while the default here is 4n operations + error checking.

So oh no, why is this generally not a problem? Let’s take a look at the stepping code for the standard “simple” method: Tsit5

@muladd function perform_step!(integrator, cache::Tsit5Cache, repeat_step=false)
  @unpack t,dt,uprev,u,f,p = integrator
  @unpack c1,c2,c3,c4,c5,c6,a21,a31,a32,a41,a42,a43,a51,a52,a53,a54,a61,a62,a63,a64,a65,a71,a72,a73,a74,a75,a76,btilde1,btilde2,btilde3,btilde4,btilde5,btilde6,btilde7 = cache.tab
  @unpack k1,k2,k3,k4,k5,k6,k7,utilde,tmp,atmp = cache
  a = dt*a21
  @.. tmp = uprev+a*k1
  f(k2, tmp, p, t+c1*dt)
  @.. tmp = uprev+dt*(a31*k1+a32*k2)
  f(k3, tmp, p, t+c2*dt)
  @.. tmp = uprev+dt*(a41*k1+a42*k2+a43*k3)
  f(k4, tmp, p, t+c3*dt)
  @.. tmp = uprev+dt*(a51*k1+a52*k2+a53*k3+a54*k4)
  f(k5, tmp, p, t+c4*dt)
  @.. tmp = uprev+dt*(a61*k1+a62*k2+a63*k3+a64*k4+a65*k5)
  f(k6, tmp, p, t+dt)
  @.. u = uprev+dt*(a71*k1+a72*k2+a73*k3+a74*k4+a75*k5+a76*k6)
  f(k7, u, p, t+dt)
  integrator.destats.nf += 6
  if integrator.alg isa CompositeAlgorithm
    g7 = u
    g6 = tmp
    # Hairer II, page 22
    @.. utilde = k7 - k6
    ϱu = integrator.opts.internalnorm(utilde,t)
    @.. utilde = g7 - g6
    ϱd = integrator.opts.internalnorm(utilde,t)
    integrator.eigen_est = ϱu/ϱd
  end
  if integrator.opts.adaptive
    @.. utilde = dt*(btilde1*k1 + btilde2*k2 + btilde3*k3 + btilde4*k4 + btilde5*k5 + btilde6*k6 + btilde7*k7)
    calculate_residuals!(atmp, utilde, uprev, u, integrator.opts.abstol, integrator.opts.reltol,integrator.opts.internalnorm,t)
    integrator.EEst = integrator.opts.internalnorm(atmp,t)
  end
  return nothing
end

In this code you see that a single step of the 5th order RK method, with eigenvalue checks for stiffness detection, takes 13 O(n) operations and 6 f calls (6 instead of 7 due to FSAL). The FSAL gives one more O(n) operation by cutting out 1 f call. Then you have the O(n) operation for error checking / unstable check. That is how the other pieces usually get masked: the more complicated the method, and the complicated the f, the less the other details matter. Also (as mentioned in the video), here the FSAL handling gets rid of an f call so it’s cheaper than not doing it (as opposed to an extra O(n) operation), and the interpolant construction is free by design. Thus if you think about it like this, the worst case scenario is to minimize the number of internal stages and minimize the complexity of f, which is precisely Euler’s method and your kind of example.

But really what’s going on here is that it doesn’t make sense for the simplest of simplest cases to use all of the machinery, which is why for this edge case we created SimpleDiffEq.jl

There is no SimpleEuler(), but if you wanted to add one that would be nice (in fact, there’s actually one built into DiffEqBase itself: https://github.com/SciML/DiffEqBase.jl/blob/master/src/internal_euler.jl#L9-L91).

Now to finish off, other interesting things can come into play if n is really small, and that’s its own can of worms which is interesting too…

github.com/SciML/SimpleDiffEq.jl

Small overhead checking benchmarks

opened 10:22AM - 13 May 20 UTC

ChrisRackauckas

```julia using OrdinaryDiffEq, StochasticDiffEq, BenchmarkTools, SimpleDiffEq…, BridgeDiffEq, StaticArrays u0=1/2 f(u,p,t) = u g(u,p,t) = u dt = 1/2^(4) tspan = (0.0,1.0) prob = ODEProblem(f,u0,(0.0,1.0)) @btime sol = solve(prob,BridgeBS3(),dt=dt) @btime sol = solve(prob,BS3(),dt=dt,adaptive=false,dense=false) prob = ODEProblem(f,u0,(0.0,100.0)) @btime sol = solve(prob,BridgeBS3(),dt=dt) @btime sol = solve(prob,BS3(),dt=dt,adaptive=false,dense=false) prob = SDEProblem(f,g,u0,(0.0,1.0)) @btime sol = solve(prob,EM(),dt=dt) @btime sol = solve(prob,SimpleEM(),dt=dt) @btime sol = solve(prob,BridgeEuler(),dt=dt) prob = SDEProblem(f,g,u0,(0.0,100.0)) @btime sol = solve(prob,EM(),dt=dt) @btime sol = solve(prob,SimpleEM(),dt=dt) @btime sol = solve(prob,BridgeEuler(),dt=dt) u0 = @SVector [2.0,3.0] prob = ODEProblem(f,u0,(0.0,1.0)) @btime sol = solve(prob,BridgeBS3(),dt=dt) @btime sol = solve(prob,BS3(),dt=dt,adaptive=false,dense=false) prob = SDEProblem(f,g,u0,(0.0,1.0)) @btime sol = solve(prob,EM(),dt=dt) @btime sol = solve(prob,SimpleEM(),dt=dt) @btime sol = solve(prob,BridgeEuler(),dt=dt) prob = SDEProblem(f,g,u0,(0.0,100.0)) @btime sol = solve(prob,EM(),dt=dt) @btime sol = solve(prob,SimpleEM(),dt=dt) @btime sol = solve(prob,BridgeEuler(),dt=dt) function lorenzoop(u,p,t) SA[10.0(u[2]-u[1]),u[1]*(28.0-u[3]) - u[2],u[1]*u[2] - (8/3)*u[3]] end u0 = SA[1.0;0.0;0.0] tspan = (0.0,100.0) prob = ODEProblem(lorenzoop,u0,tspan) @btime sol = solve(prob,BridgeBS3(),dt=dt) @btime sol = solve(prob,BS3(),dt=dt,adaptive=false,dense=false) @btime sol = solve(prob,Tsit5(),dt=dt,adaptive=false,dense=false) @btime sol = solve(prob,SimpleTsit5(),dt=dt) function σ_lorenz(u,p,t) SA[3.0,3.0,3.0] end prob = SDEProblem(lorenzoop,σ_lorenz,u0,(0.0,100.0)) dt = 1/2^6 @btime sol = solve(prob,EM(),dt=dt) @btime sol = solve(prob,SimpleEM(),dt=dt) @btime sol = solve(prob,BridgeEuler(),dt=dt) ```

It’s sometimes fun to see if you can get the fully generic version down to match the most basic implementation when u is a scalar and f is identity (@Oscar_Smith do you know if there’s some way to do a faster isnan?). But yeah, I hope that gives a bit of insight into how the simplest ODEs are somewhat of a corner case and why SimpleDiffEq.jl exists.

https://diffeq.sciml.ai/stable/solvers/ode_solve/#SimpleDiffEq.jl

Oscar_Smith · August 19, 2021, 10:59pm

Don’t believe so. isnan(x) is just x!=x (ie a single vectorizable instruction). You can’t get much faster than that

ChrisRackauckas · August 20, 2021, 9:52am

I think any(isnan,u) isn’t SIMD’ing though. We should look into this a bit more.

Topic		Replies	Views
Time integration in OrdinaryDiffEq.jl in Julia is slower than calling it from Python Performance performance , python , ordinarydiffeq	12	296	September 16, 2024
Confusing memory allocations when using the integrator of DifferentialEquations.jl Performance question , diffeq , memory-allocation	10	448	January 12, 2022
DifferentialEquations.jl How to speed up the solver Performance	3	467	June 30, 2022
How does cache works in JuliaDiffEq? Numerics package , diffeq	1	634	August 29, 2018
Benchmarking a simple PDE algorithm in Julia, Python, Matlab, C++, and Fortran Numerics diffeq	41	8883	November 19, 2019

Comparing the Diffeq.jl Euler stepper with a by-hand Euler stepper

Related topics