My problem is the following. I have an ODE with some parameters to estimate, based on some data. In this example however I will generate the data from the ode itself to make some test.
I take an example from LotkaVolterra
using DifferentialEquations
function lotka_volterra!(du, u, p, t)
x, y = u
α, β, δ, γ = p
du[1] = α*x - β*x*y
du[2] = -δ*y + γ*x*y
end
u0 = [1.0, 1.0]
t_span = (0.0,10.0)
t_data = 0.0:1:10.0
data_size = length(t_data)
true_p = [1.5, 1.0, 3.0, 1.0]
true_prob = ODEProblem(lotka_volterra!,u0,t_span)
true_sol = solve(true_prob,Tsit5(),p=true_p,saveat=t_data,abstol=1e-12)
data = true_sol.u
so now I have the data to fit.
One option is to write a loss function that takes as input the parameter p
and gives some kind of error (like L2) of the solution vs the data that I saved. However I would like to test also the following alternative strategy:
divide the time interval onto sub-time spans, corresponding to times between observations. In each of this problem I can setup an ode, that has an initial condition = the data at the beginning on that interval. Then I write a loss function that takes p
as an input an runs all the sub/problems and compute the error again by testing each solutions with the data at the end of its own sub-time span.
So before even writing the optimization I test that, using the true parameter true_p
I should get almost 0 error. I solve this problem by setting up an EnsembleProblem
since it looks like is the best way to do (also parallelism, more on it at the end).
t_span_interval = [(t_data[i],t_data[i+1]) for i in 1:data_size-1]
prob_intervall = ODEProblem(lotka_volterra!,u0,t_span_interval[1],true_p)
function prob_func(prob,i,repeat)
ODEProblem(prob.f,data[i],t_span_interval[i],prob.p)
end
ensemble_prob = EnsembleProblem(prob_intervall,prob_func=prob_func)
ensemble_sol = solve(ensemble_prob,Tsit5(),abstol=1e-12,trajectories=data_size-1,
save_everystep=false,save_start=true,save_end=true)
data_ensemble = [ensemble_sol.u[i][end] for i in 1:data_size-1]
data_ensemble .- data[2:data_size]
Running it, this is the output:
julia> data_ensemble .- data[2:data_size]
10-element Vector{Vector{Float64}}:
[2.7065844739659894e-5, -1.3964528888632532e-5]
[0.003105915036464957, -0.0024319585862495607]
[8.646330812744907e-5, 0.00036598671302412455]
[-6.986510828399517e-5, 4.878692306975463e-6]
[-0.0004903808993592662, 0.000827828489636806]
[-0.006222412625121798, 0.0055218332192099595]
[0.0011076821208162446, 0.00037821974241425416]
[-5.89828264159209e-5, -5.9286058622209925e-5]
[0.002522886083798692, 0.003895955430558651]
[-0.00044247516121154185, -0.0003218457894433868]
which looks like is way too big. I set the tol of the solvers to 1e-12 just to see if was a numerical error but it doesn’t look like it. So where is the issue?
Why do this? Just to give it a try. I am no expert in data fitting or optimization, so perhaps this is very wrong. However I would like to see what happens and also exploit the fact that this form of method is highly parallelizable since odes in different intervals are independent.
Thanks in advance.
edit just in case there is something terribly wrong on doing this
A possible error function would be
function loss(p)
tmp_prob = remake(prob_intervall,p = p)
tmp_ensemble_prob = EnsembleProblem(tmp_prob,prob_func=prob_func)
tmp_sol = solve(tmp_ensemble_prob,trajectories=data_size-1,
save_everystep=false,save_start=false,save_end=true)
return sum([sum(abs2,tmp_sol.u[i][end].-data[i+1]) for i in 1:data_size-1 ])
end