I am considering these 3 ODEs
function minimalModelNonAlloc!(D, u, p, t) M = p v = 0.2 .* ones(length(u)) D .= v - M * u.^3 - u.^5 return nothing end function minimalModelAlloc(u, p, t) M = p v = 0.2 .* ones(length(u)) return v - M * u.^3 - u.^5 end function minimalModelAllocContainer(u, p, t) @unpack M = p v = 0.2 .* ones(length(u)) return v - M * u.^3 - u.^5 end
and compare their performance when optimising some loss function via
M (see below) for different dimensions of
u. In the last model I use a
p as a container.
I find following result:
(There is also a similar result for the allocate memory, but I am only allowed to upload one file )
Okay, unsurprisingly, the containerised version fairs worse than
minimalModelAlloc which I expected, since the container adds some overhead. But the magnitude of it surprises me nevertheless. However, the biggest surprise is, that the non allocating version is by far the worst. We are talking 1.9 seconds per run vs ~ 80 ms for
length(u). These results are completely unintuitve for me. Could anyone please tell me what part of the equation I am missing
The code for benchmarking I use reads
dims = [2, 5, 10, 20, 50, 100, 200] memories =  times =  for dim ∈ dims @info "Processing $dim dimensions" function lossBenchmark(p) sol = solve(prob, Tsit5(), p = p) loss = sum(abs2, sol[end]) return loss, sol end u0 = zeros(dim) # pTest = ones(dim, dim) pTest = ComponentArray(M = ones(dim, dim)) prob = ODEProblem(minimalModelAllocContainer, u0, (0.0, 10.0), pTest) benchmark = @benchmark DiffEqFlux.sciml_train($lossBenchmark, $pTest, $ADAM(0.05), maxiters = 30) samples = 100 seconds = 30 push!(memories, benchmark.memory) push!(times, benchmark.times |> mean) end
All models are tested under exactly the same
M so that cannot be the reason. Also, I know there is still so much room for improvement. My goal here was just to make all three approaches comparable. Thanks in advance for any input on that puzzeling finding .