StaticArrays + ArrayPartition + DiffEq = Allocations?

I’m attempting to convert the DiffEq example using ArrayPartition to also use StaticArrays to understand how to feed in an ArrayPartition of StaticArrays for my own problem. This is meant to be a toy example to help me understand things a bit better.

Also, using the info on feeding StaticArrays into the problem for code optimization here.

Problem: My f function does not allocate (this is good!) and there aren’t any type instabilities when running @code_warntype but when I time the actual solve there are a ton of allocations. This seems counter to what the page on code optimization indicates.

Question: What is going on here? Am I calling @btime incorrectly? Is there something else I need to do in f to prevent allocations?

MWE:

using Unitful, RecursiveArrayTools, OrdinaryDiffEq
using LinearAlgebra
using StaticArrays

r0 = SA[1131.340, -2282.343, 6672.423]u"km"
v0 = SA[-5.64305, 4.30333, 2.42879]u"km/s"
Δt = 86400.0*365u"s"
μ = 398600.4418u"km^3/s^2"
rv0 = ArrayPartition(r0,v0)

function f(y, μ, t)
    r = norm(y.x[1])
    dy1 = y.x[2]
    dy2 = -μ * y.x[1] / r^3
    ArrayPartition(dy1, dy2)
end


prob = ODEProblem(f, rv0, (0.0u"s", Δt), μ)

using BenchmarkTools
@btime f($rv0, $μ, 0.0) # ~11ns, 0 allocations
alg = Vern8()
save_everystep = false
@btime solve($prob, $alg, save_everystep=$save_everystep) # ~25ms, ~70k allocations

I’d check this on v1.9 because the effects analysis improved and this may just be something where the compiler didn’t optimize it out on a given version.

Thank Chris,

I took your advice and downloaded the 1.9.0-beta3. What I see is that the total compute time is roughly flat with 1.8.5, but the allocations did significantly decrease.

Does this mean the code is butting up against the speed of the actual calculation? This seems pretty slow for what appears to be a simple solve, but there also appears to be a history of people complaining about the efficiency of norm. IDK if that’s still relevant though.

Timing
Running identical code from above in two different environments:

Version 1.8.5


@btime f($rv0, $μ, 0.0)
  7.800 ns (0 allocations: 0 bytes)

@btime solve($prob, $alg, save_everystep=$save_everystep)
  16.090 ms (70550 allocations: 3.24 MiB)

Version 1.9.0-beta3

 @btime f($rv0, $μ, 0.0)
  7.500 ns (0 allocations: 0 bytes)

 @btime solve($prob, $alg, save_everystep=$save_everystep)
  15.072 ms (121 allocations: 11.98 KiB)

I just used add packagname manually for adding everything so I’m not sure if each environment has the exact same package versions and I’m not sure how to check easily. IDK if that’s important, but I thought I’d add it.

Yup, this looks about as expected.

That was fixed up.

What does the profile say? Share a flame graph. If it’s mostly in the parts that aren’t allocating then the allocations are not the issues. Allocations can even improve performance in some cases.

For the lowest overhead case, try the Vern implementations in SimpleDiffEq.jl. If it’s a dead simple ODE like this, then those should have essentially zero overhead since they are just the loop. GPUVern7 and GPUVern9.

The big thing to ask is whether the Verner methods are the right ones for the job here. At the tolerances you’re choosing, the answer is probably no.