Hello,
I am considering these 3 ODEs
function minimalModelNonAlloc!(D, u, p, t)
M = p
v = 0.2 .* ones(length(u))
D .= v - M * u.^3 - u.^5
return nothing
end
function minimalModelAlloc(u, p, t)
M = p
v = 0.2 .* ones(length(u))
return v - M * u.^3 - u.^5
end
function minimalModelAllocContainer(u, p, t)
@unpack M = p
v = 0.2 .* ones(length(u))
return v - M * u.^3 - u.^5
end
and compare their performance when optimising some loss function via M
(see below) for different dimensions of u
. In the last model I use a ComponentArray
for p
as a container.
I find following result:
(There is also a similar result for the allocate memory, but I am only allowed to upload one file )
Okay, unsurprisingly, the containerised version fairs worse than minimalModelAlloc
which I expected, since the container adds some overhead. But the magnitude of it surprises me nevertheless. However, the biggest surprise is, that the non allocating version is by far the worst. We are talking 1.9 seconds per run vs ~ 80 ms for minimalModelAlloc
at length(u)
. These results are completely unintuitve for me. Could anyone please tell me what part of the equation I am missing
The code for benchmarking I use reads
dims = [2, 5, 10, 20, 50, 100, 200]
memories = []
times = []
for dim ∈ dims
@info "Processing $dim dimensions"
function lossBenchmark(p)
sol = solve(prob, Tsit5(), p = p)
loss = sum(abs2, sol[end])
return loss, sol
end
u0 = zeros(dim)
# pTest = ones(dim, dim)
pTest = ComponentArray(M = ones(dim, dim))
prob = ODEProblem(minimalModelAllocContainer, u0, (0.0, 10.0), pTest)
benchmark = @benchmark DiffEqFlux.sciml_train($lossBenchmark, $pTest, $ADAM(0.05), maxiters = 30) samples = 100 seconds = 30
push!(memories, benchmark.memory)
push!(times, benchmark.times |> mean)
end
All models are tested under exactly the same u0
and M
so that cannot be the reason. Also, I know there is still so much room for improvement. My goal here was just to make all three approaches comparable. Thanks in advance for any input on that puzzeling finding .