The incorrect use of SVectors does exist here, but since the return is unused it just gets deleted by the compiler so that’s fine. Here’s a break down:
using OrdinaryDiffEq
function sigma(x)
return 1.0 / ( 1.0 + exp( -10.0 * ( x - ( - 0.25 ) ) ) )
end
function HR!(du, u, p, t)
a, b, c, d, s, xr, r, I, vs, k1, k2, el_link = p
x1, y1, z1, x2, y2, z2 = u
du[1] = y1 + b * x1 ^ 2 - a * x1 ^3 - z1 + I - k1 * ( x1 - vs ) * sigma(x2) + el_link * ( x2 - x1 )
du[2] = c - d * x1 ^2 - y1
du[3] = r * ( s * ( x1 - xr ) - z1 )
du[4] = y2 + b * x2 ^ 2 - a * x2 ^3 - z2 + I - k2 * ( x2 - vs ) * sigma(x1) + el_link * ( x1 - x2 )
du[5] = c - d * x2 ^2 - y2
du[6] = r * ( s * ( x2 - xr ) - z2 )
end
u0 = [ 0., 0.2, 0.3,
0., 0.5, 0.6]
tspan = (0., 100000.)
a = 1.
b = 3.
c = 1.
d = 5.
xr = -1.6
r = 0.01 # 0.01
s = 5.
I = 4.
xv = 2.
k1= -0.17
k2 = k1
k = 0.0
p = [a, b, c, d,
s, xr, r, I, xv, k1, k2, k]
prob = ODEProblem(HR!, [-1.5, 0.0, 0.0, -2.5, 0.0, 0.0], tspan, p)
using BenchmarkTools
@btime sol = solve(prob, AutoVern9(Rodas5()), abstol = 1e-11, reltol = 1e-11, maxiters = 10000000)
# 6.050 s (19333763 allocations: 1.86 GiB)
julia> sum(sol.alg_choice .== 2)
0
@btime sol = solve(prob, Vern9(), abstol = 1e-11, reltol = 1e-11, maxiters = 10000000)
# 3.773 s (16571721 allocations: 1.81 GiB)
@btime sol = solve(prob, Vern9(), abstol = 1e-11, reltol = 1e-11, dense = false, maxiters = 10000000)
# 2.032 s (1381039 allocations: 185.68 MiB)
@btime sol = solve(prob, Vern9(), abstol = 1e-11, reltol = 1e-11, save_everystep=false, maxiters = 10000000)
# 1.973 s (46 allocations: 14.42 KiB)
@profview sol = solve(prob, Vern9(), abstol = 1e-11, reltol = 1e-11, save_everystep=false, maxiters = 10000000)
function sigma(x)
return @fastmath 1.0 / ( 1.0 + exp( -10.0 * ( x - ( - 0.25 ) ) ) )
end
@btime sol = solve(prob, Vern9(), abstol = 1e-11, reltol = 1e-11, save_everystep=false, maxiters = 10000000)
# 1.797 s (46 allocations: 14.42 KiB)
What this is showing is that first of all, the biggest source of overhead is the existence of the possibility of switching, even though it’s not used. Since this equation is just a non-stiff equation (as shown by alg_choice
being 1
over the whole integration), cutting out the overhead of holding onto the stiff solver cuts time down by about 2 seconds. The next major contribution is the saving overhead, which you can see if you run a profile, or another way to see it is just by turning off saving. That’s the next 2 seconds. Then when this is all optimized, you can see there’s an opportunity to @fastmath
the exp
operation which ends up being the biggest cost in the calculation by far, though this 0.2 is only meaningful if the other pieces are cut down.
The final profile is then:
Which is then the sign of optimized, because what it means is that the steps (perform_step
) take up the vast majority of the time (so no loading or saving overhead), with the majority of the stepping time being due to a @fastmath exp
call. That then identifies that specific operation as your limiting factor.
So in the end, most of the cost seems to be that it’s just a very big time span and taking a lot of steps because the tolerance is very low. With that in mind, then the biggest improvement would come from just finding an explicit ODE solver that takes less steps. I’ll continue the benchmarks in the no-saving mode because the improvements for this would apply even if the saving is done, though you’d need to about double the time if saving is done every step.
@btime sol = solve(prob, Vern8(), abstol = 1e-11, reltol = 1e-11, save_everystep=false, maxiters = 10000000)
# 866.393 ms (45 allocations: 11.89 KiB)
@btime sol = solve(prob, Vern7(), abstol = 1e-11, reltol = 1e-11, save_everystep=false, maxiters = 10000000)
# 1.608 s (43 allocations: 9.39 KiB)
@btime sol = solve(prob, Tsit5(), abstol = 1e-11, reltol = 1e-11, save_everystep=false, maxiters = 10000000)
# 2.369 s (41 allocations: 5.67 KiB)
Of the simple Runge-Kutta methods, it seems Vern8
is the most optimized method on this equation (at least, without checking the precision), solving in about half the function calls. Exactly why is uncertain here, we’d have dig into a lot more detail, but in general whether to use Vern7 vs 8 vs 9 is somewhat a toss up: this one is an 8.
Lastly, what @IlianPihlajamaa was referring to is that you can use a static array form instead of mutation. If you look at the computation, this actually slows it down:
function HR(u, p, t)
a, b, c, d, s, xr, r, I, vs, k1, k2, el_link = p
x1, y1, z1, x2, y2, z2 = u
du1 = y1 + b * x1 ^ 2 - a * x1 ^3 - z1 + I - k1 * ( x1 - vs ) * sigma(x2) + el_link * ( x2 - x1 )
du2 = c - d * x1 ^2 - y1
du3 = r * ( s * ( x1 - xr ) - z1 )
du4 = y2 + b * x2 ^ 2 - a * x2 ^3 - z2 + I - k2 * ( x2 - vs ) * sigma(x1) + el_link * ( x1 - x2 )
du5 = c - d * x2 ^2 - y2
du6 = r * ( s * ( x2 - xr ) - z2 )
return SVector(du1, du2, du3,
du4, du5, du6)
end
p = SA[a, b, c, d,
s, xr, r, I, xv, k1, k2, k]
probsa = ODEProblem(HR, SA[-1.5, 0.0, 0.0, -2.5, 0.0, 0.0], tspan, p)
@btime sol = solve(probsa, Vern8(), abstol = 1e-11, reltol = 1e-11, save_everystep=false, maxiters = 10000000)
# 891.670 ms (22 allocations: 10.42 KiB)
But if you are then adding saving, then the probsa
static array form has better packing in the final result, and thus is beneficial:
@btime sol = solve(probsa, Vern8(), abstol = 1e-11, reltol = 1e-11, maxiters = 10000000)
# 1.566 s (1302581 allocations: 1.04 GiB)
@btime sol = solve(probsa, Vern8(), abstol = 1e-11, reltol = 1e-11, dense=false, maxiters = 10000000)
# 993.665 ms (46 allocations: 129.09 MiB)
@btime sol = solve(prob, Vern8(), abstol = 1e-11, reltol = 1e-11, maxiters = 10000000)
# 6.709 s (19532495 allocations: 2.15 GiB)
That gives us a best final time with saving as 1.5s, 870ms if not saving, and ~1s if you don’t need a dense interpolation.
I suggest then choosing saveat
appropriately to avoid the interpolation. Then the SVector
+ saveat
+ Vern8
approach gives you a significant boost.