Performance comparision for equivalent DiffEqFlux models

Kevin_Kleinbeck · January 22, 2021, 8:14pm

Hello,

I am considering these 3 ODEs

function minimalModelNonAlloc!(D, u, p, t)
	M = p
	v = 0.2 .* ones(length(u))

	D .= v - M * u.^3 - u.^5
	return nothing
end

function minimalModelAlloc(u, p, t)
	M = p
	v = 0.2 .* ones(length(u))

	return v - M * u.^3 - u.^5
end

function minimalModelAllocContainer(u, p, t)
	@unpack M = p
	v = 0.2 .* ones(length(u))

	return v - M * u.^3 - u.^5
end

and compare their performance when optimising some loss function via M (see below) for different dimensions of u. In the last model I use a ComponentArray for p as a container.

I find following result:
Performance test
(There is also a similar result for the allocate memory, but I am only allowed to upload one file )

Okay, unsurprisingly, the containerised version fairs worse than minimalModelAlloc which I expected, since the container adds some overhead. But the magnitude of it surprises me nevertheless. However, the biggest surprise is, that the non allocating version is by far the worst. We are talking 1.9 seconds per run vs ~ 80 ms for minimalModelAlloc at length(u). These results are completely unintuitve for me. Could anyone please tell me what part of the equation I am missing

The code for benchmarking I use reads

dims     = [2, 5, 10, 20, 50, 100, 200]
memories = []
times    = []
for dim ∈ dims
	@info "Processing $dim dimensions"
	function lossBenchmark(p)
		sol  = solve(prob, Tsit5(), p = p)
		loss = sum(abs2, sol[end])
		return loss, sol
	end

	u0 = zeros(dim)
	# pTest = ones(dim, dim)
	pTest = ComponentArray(M = ones(dim, dim))
	prob = ODEProblem(minimalModelAllocContainer, u0, (0.0, 10.0), pTest)
	benchmark = @benchmark DiffEqFlux.sciml_train($lossBenchmark, $pTest, $ADAM(0.05), maxiters = 30) samples = 100 seconds = 30

	push!(memories, benchmark.memory)
	push!(times,    benchmark.times |> mean)
end

All models are tested under exactly the same u0 and M so that cannot be the reason. Also, I know there is still so much room for improvement. My goal here was just to make all three approaches comparable. Thanks in advance for any input on that puzzeling finding .

ChrisRackauckas · January 22, 2021, 9:15pm

It has to do with how reverse mode AD is implemented. Reverse mode AD requires knowing the value in order to reverse on it. In order to handle mutation, it goes into a fallback behavior based on scalar mode.

https://diffeq.sciml.ai/stable/analysis/sensitivity/#Internal-Automatic-Differentiation-Options-(ADKwargs)

https://diffeqflux.sciml.ai/dev/ControllingAdjoints/

Basically, you might want to try benchmarking with:

	function lossBenchmark(p)
		sol  = solve(prob, Tsit5(), p = p, sensealg = InterpolatingAdjoint(ReverseDiffVJP(true)))
		loss = sum(abs2, sol[end])
		return loss, sol
	end

That would compile the tape and improve the speed of scalar mode. You’ll probably see a smaller difference in that case (and all will be faster than before).

The choice of sensealg can matter a lot for this. But yes, reverse-mode can be quite unintuitive in terms of allocation handling, or at least it will look like it breaks the standard rules.

Kevin_Kleinbeck · January 23, 2021, 7:04am

Thanks for the answer! The performance between the allocating and non-allocating model become indeed quite comparible. The model using ComponentArrays, however, now fails with type TrackedArray has no field M in the line @unpack M = p; full stack trace below.

This aligns with what I observed in testing, so I just assume ComponentArrays do not properly work with AD? I know, this now goes a little of topic, but are there any containers which work for AD / how hard would it be to create a struct and the necessary functions for this?

type TrackedArray has no field M
in top-level scope at Performance Tests.jl:38
in macro expansion at BenchmarkTools\eCEpo\src\execution.jl:287
in warmup at BenchmarkTools\eCEpo\src\execution.jl:141
in #warmup#45 at BenchmarkTools\eCEpo\src\execution.jl:141 
in run##kw at BenchmarkTools\eCEpo\src\execution.jl:94 
in run##kw at BenchmarkTools\eCEpo\src\execution.jl:94 
in #run#40 at BenchmarkTools\eCEpo\src\execution.jl:94
in run_result##kw at BenchmarkTools\eCEpo\src\execution.jl:32 
in #run_result#37 at BenchmarkTools\eCEpo\src\execution.jl:32 
in invokelatest##kw at base\essentials.jl:711 
in #invokelatest#1 at base\essentials.jl:716 
in  at base\essentials.jl:715
in _run##kw at BenchmarkTools\eCEpo\src\execution.jl:399 
in #_run#38 at BenchmarkTools\eCEpo\src\execution.jl:405
in ##sample#564 at BenchmarkTools\eCEpo\src\execution.jl:377
in ##core#563 at BenchmarkTools\eCEpo\src\execution.jl:371
in sciml_train##kw at DiffEqFlux\FZMwP\src\train.jl:77 
in sciml_train##kw at DiffEqFlux\FZMwP\src\train.jl:77 
in #sciml_train#31 at DiffEqFlux\FZMwP\src\train.jl:42
in maybe_with_logger at DiffEqBase\V7P18\src\utils.jl:259
in  at DiffEqFlux\FZMwP\src\train.jl:43
in macro expansion at ProgressLogging\BBN0b\src\ProgressLogging.jl:328 
in macro expansion at DiffEqFlux\FZMwP\src\train.jl:98 
in gradient at Zygote\1GXzF\src\compiler\interface.jl:54
in  at Zygote\1GXzF\src\compiler\interface.jl:177
in  at Zygote\1GXzF\src\compiler\interface2.jl
in #33 at DiffEqFlux\FZMwP\src\train.jl:99 
in  at ZygoteRules\OjfTt\src\adjoint.jl:59
in #175 at Zygote\1GXzF\src\lib\lib.jl:182 
in  at Zygote\1GXzF\src\compiler\interface2.jl
in lossBenchmark at Performance Tests.jl:29 
in  at Zygote\1GXzF\src\compiler\interface2.jl
in solve##kw at DiffEqBase\V7P18\src\solve.jl:100 
in  at ZygoteRules\OjfTt\src\adjoint.jl:59
in  at Zygote\1GXzF\src\lib\lib.jl:182
in  at Zygote\1GXzF\src\compiler\interface2.jl
in #solve#460 at DiffEqBase\V7P18\src\solve.jl:102 
in  at ZygoteRules\OjfTt\src\adjoint.jl:59
in #175 at Zygote\1GXzF\src\lib\lib.jl:182 
in #498#back at ZygoteRules\OjfTt\src\adjoint.jl:59 
in  at DiffEqSensitivity\WiCRA\src\local_sensitivity\concrete_solve.jl:144
in  at DiffEqSensitivity\WiCRA\src\local_sensitivity\sensitivity_interface.jl:6
in #adjoint_sensitivities#42 at DiffEqSensitivity\WiCRA\src\local_sensitivity\sensitivity_interface.jl:6
in _adjoint_sensitivities at DiffEqSensitivity\WiCRA\src\local_sensitivity\sensitivity_interface.jl:13
in _adjoint_sensitivities at DiffEqSensitivity\WiCRA\src\local_sensitivity\sensitivity_interface.jl:13
in #_adjoint_sensitivities#43 at DiffEqSensitivity\WiCRA\src\local_sensitivity\sensitivity_interface.jl:17
in ODEAdjointProblem##kw at DiffEqSensitivity\WiCRA\src\local_sensitivity\interpolating_adjoint.jl:161 
in #ODEAdjointProblem#114 at DiffEqSensitivity\WiCRA\src\local_sensitivity\interpolating_adjoint.jl:173
in ODEInterpolatingAdjointSensitivityFunction at DiffEqSensitivity\WiCRA\src\local_sensitivity\interpolating_adjoint.jl:22 
in #ODEInterpolatingAdjointSensitivityFunction#110 at DiffEqSensitivity\WiCRA\src\local_sensitivity\interpolating_adjoint.jl:55
in  at DiffEqSensitivity\WiCRA\src\local_sensitivity\adjoint_common.jl:27
in #adjointdiffcache#73 at DiffEqSensitivity\WiCRA\src\local_sensitivity\adjoint_common.jl:131
in ReverseDiff.GradientTape at ReverseDiff\vScHI\src\api\tape.jl:204
in ReverseDiff.GradientTape at ReverseDiff\vScHI\src\api\tape.jl:207
in  at DiffEqSensitivity\WiCRA\src\local_sensitivity\adjoint_common.jl:132
in  at DiffEqBase\V7P18\src\diffeqfunction.jl:248
in minimalModelAllocContainer at Performance Tests.jl:17
in macro expansion at UnPack\EkESO\src\UnPack.jl:101 
in unpack at UnPack\EkESO\src\UnPack.jl:34 
in getproperty at base\Base.jl:33

ChrisRackauckas · January 23, 2021, 7:28am

ReverseDiff only has a small set of array types it supports, so it’s a ReverseDiff.jl issue, while Zygote.jl is fine with this. Unless you want to get in the weeds for fun, I’d say to just do what you can for now and Diffractor will solve a lot of these little issues in a better way (or Enzyme).

Topic		Replies	Views
Memory allocation when solving ODE with Vern7 and ContinuousCallback Performance performance , ode , diffeqcallbacks	2	476	August 29, 2023
Using views causes significant slowdown in simple PDE Performance diffeq , performance , memory-allocation , pde	2	518	February 7, 2021
Benchmarking all algorithms in DiffEq/Optimisation code Performance diffeq , benchmark , optimization	1	677	June 7, 2022
Neural ODE fitting really slow Machine Learning diffeq , flux , sciml	13	1948	September 11, 2020
Benchmarks for our own new differential equation solver Modelling & Simulations	2	419	May 16, 2022

Performance comparision for equivalent DiffEqFlux models

Related topics