Using Generalized Likelihood with one dimesional time series as data in DifferentialEquations.jl

I anticipate that this question could be down to my ignorance in statistics.

I’m trying to apply the DifferentialEquations.jl tutorial on how to use Generalized Likelihood to a particular case in which the data is a one dimensional time series.

My model is a SIRD with 4 age classes with three parameters [β, λ_R, λ_D] I’m trying to calibrate with respect to cumulated deaths. Here is the code:

# Basic Utilities
using StaticArrays, RecursiveArrayTools, ParameterizedFunctions
# Numerical Computation
using DifferentialEquations, LinearAlgebra, DiffEqBayes, OrdinaryDiffEq, DiffEqParamEstim, Optim
# Statistics 
using Random, Distributions
# Data Wrangling 
using Queryverse, DataFrames, HTTP, CSV


final_time = 20 # day up to which calibration will be performed

# Model parameters

β = 0.05 # infection rate
λ_R = 0.55 # inverse of transition time from  infected to recovered
λ_D = 0.83 # inverse of transition time from  infected to dead
i₀ = 0.075 # fraction of initial infected people in every age class
𝒫 = vcat([β, λ_R, λ_D]...)

# regional contact matrix and regional population
## regional contact matrix

regional_all_contact_matrix = [3.45536   0.485314  0.506389  0.123002 ; 0.597721  2.11738   0.911374  0.323385 ; 0.906231  1.35041   1.60756   0.67411 ; 0.237902  0.432631  0.726488  0.979258] # 4x4 contact matrix

## regional population stratified by age

N= [723208 , 874150, 1330993, 1411928] # array of 4 elements, each of which representing the absolute amount of population in the corresponding age class.

# Initial conditions 
I₀ = repeat([i₀],4)
S₀ = N.-I₀
R₀ = [0.0 for n in 1:length(N)]
D₀ = [0.0 for n in 1:length(N)]
D_tot₀ = [0.0 for n in 1:length(N)]
ℬ = vcat([S₀, I₀, R₀, D₀, D_tot₀]...) 

# Time 
𝒯 = (1.0,final_time); 

# data used for calibration
piedmont_cumulative_deaths = [0,2,4,5,5,13,17,21,26,46,59,81,111,133,154,175,209,238,283,315]


function SIRD!(du,u,p,t)  
    # Parameters to be calibrated
    β, λ_R, λ_D = p

    # initializa this parameter (death probability stratified by age, taken from literature)
    δ₁, δ₂, δ₃, δ₄ = [0.003/100, 0.004/100, (0.015+0.030+0.064+0.213+0.718)/(5*100), (2.384+8.466+12.497+1.117)/(4*100)]
    δ = vcat(repeat([δ₁],1),repeat([δ₂],1),repeat([δ₃],1),repeat([δ₄],4-1-1-1))

    
    C = regional_all_contact_matrix 
    
    # State variables
    S = @view u[4*0+1:4*1]
    I = @view u[4*1+1:4*2]
    R = @view u[4*2+1:4*3]
    D = @view u[4*3+1:4*4]
    D_tot = @view u[4*4+1:4*5]

    # Differentials
    dS = @view du[4*0+1:4*1]
    dI = @view du[4*1+1:4*2]
    dR = @view du[4*2+1:4*3]
    dD = @view du[4*3+1:4*4]
    dD_tot = @view du[4*4+1:4*5]
    
    # Force of infection
    Λ = β*[sum([C[i,j]*I[j]/N[j] for j in 1:size(C)[1]]) for i in 1:size(C)[2]] 
    
    # System of equations
    @. dS = -Λ*S
    @. dI = Λ*S - ((1-δ)*λ_R + δ*λ_D)*I
    @. dR = λ_R*(1-δ)*I 
    @. dD = λ_D*δ*I
    @. dD_tot = dD[1]+dD[2]+dD[3]+dD[4]

end;

# generate fake piedmont_cumulative_recovered
problem = ODEProblem(SIRD!, ℬ, 𝒯, 𝒫)
solution_all = @time solve(problem, saveat=1:final_time);


t = collect(1.:final_time);  

I would like to claibrate parameters [β, λ_R, λ_D] with respect to piedmont_cumulative_deaths , following the above tutorial. I tried the following two approaches:

Approach 1

In this approach, noting that in the tutorial there is a fit_mle for every time step, I try to replicate it here. The result is an array of distributions with 0 variance , Nan fitness , constant impove/step and somewhat realistic parameter values.

using BlackBoxOptim

aggregate_data =  piedmont_cumulative_deaths
distributions = [fit_mle(Normal,[aggregate_data[i]]) for i in 1:length(t)] 
obj = build_loss_objective(problem,Tsit5(),LogLikeLoss(t,distributions),
                                     maxiters=10000,verbose=true, save_idxs= [4*4+1])
bound1 = Tuple{Float64, Float64}[(0.01, 1.0),(0.01, 1.0),(0.01, 1.0)]
result = bboptimize(obj;SearchRange = bound1, MaxSteps = 11e3)
Starting optimization with optimizer DiffEvoOpt{FitPopulation{Float64},RadiusLimitedSelector,BlackBoxOptim.AdaptiveDiffEvoRandBin{3},RandomBound{ContinuousRectSearchSpace}}
0.00 secs, 0 evals, 0 steps
0.52 secs, 536 evals, 268 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
1.03 secs, 1086 evals, 543 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
1.53 secs, 1550 evals, 775 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
2.03 secs, 1964 evals, 982 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
2.53 secs, 2500 evals, 1250 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
3.03 secs, 2988 evals, 1494 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
3.53 secs, 3490 evals, 1745 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
4.03 secs, 4012 evals, 2006 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
4.54 secs, 4544 evals, 2272 steps, improv/step: 1.000 (last = 0.9962), fitness=NaN
5.04 secs, 5060 evals, 2530 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
5.54 secs, 5596 evals, 2798 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
6.04 secs, 6064 evals, 3032 steps, improv/step: 0.999 (last = 0.9957), fitness=NaN
6.54 secs, 6548 evals, 3274 steps, improv/step: 0.999 (last = 1.0000), fitness=NaN
7.04 secs, 7066 evals, 3533 steps, improv/step: 0.999 (last = 1.0000), fitness=NaN
7.54 secs, 7638 evals, 3819 steps, improv/step: 0.999 (last = 1.0000), fitness=NaN
8.05 secs, 8166 evals, 4083 steps, improv/step: 0.999 (last = 0.9962), fitness=NaN
8.55 secs, 8718 evals, 4359 steps, improv/step: 0.999 (last = 1.0000), fitness=NaN
9.05 secs, 9184 evals, 4592 steps, improv/step: 0.999 (last = 1.0000), fitness=NaN
9.55 secs, 9628 evals, 4814 steps, improv/step: 0.999 (last = 1.0000), fitness=NaN
10.05 secs, 10110 evals, 5055 steps, improv/step: 0.999 (last = 1.0000), fitness=NaN
10.55 secs, 10634 evals, 5317 steps, improv/step: 0.999 (last = 1.0000), fitness=NaN
11.05 secs, 11116 evals, 5558 steps, improv/step: 0.999 (last = 1.0000), fitness=NaN
11.56 secs, 11644 evals, 5822 steps, improv/step: 0.999 (last = 1.0000), fitness=NaN
12.06 secs, 12146 evals, 6073 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
12.56 secs, 12590 evals, 6295 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
13.06 secs, 13060 evals, 6530 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
13.56 secs, 13586 evals, 6793 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
14.06 secs, 14060 evals, 7030 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
14.57 secs, 14588 evals, 7294 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
15.07 secs, 15110 evals, 7555 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
15.57 secs, 15630 evals, 7815 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
16.07 secs, 16174 evals, 8087 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
16.57 secs, 16670 evals, 8335 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
17.07 secs, 17160 evals, 8580 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
17.57 secs, 17734 evals, 8867 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
18.07 secs, 18256 evals, 9128 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
18.58 secs, 18748 evals, 9374 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
19.08 secs, 19300 evals, 9650 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
19.58 secs, 19878 evals, 9939 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
20.08 secs, 20390 evals, 10195 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
20.59 secs, 20918 evals, 10459 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
21.09 secs, 21394 evals, 10697 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN
21.59 secs, 21904 evals, 10952 steps, improv/step: 1.000 (last = 1.0000), fitness=NaN

Optimization stopped after 11001 steps and 21.68 seconds
Termination reason: Max number of steps (11000) reached
Steps per second = 507.50
Function evals per second = 1014.99
Improvements/step = 0.99982
Total function evaluations = 22002


Best candidate found: [0.0609603, 0.0286277, 0.0136111]

Fitness: NaN

Approach 2

To avoid the null variance, I fit only one distribution on all data. This still leads to not realistic parameter values.

using BlackBoxOptim

aggregate_data =  piedmont_cumulative_deaths
distributions = [fit_mle(Normal,aggregate_data)] 
obj = build_loss_objective(problem,Tsit5(),LogLikeLoss([1],distributions),
                                     maxiters=10000,verbose=true, save_idxs= [4*4+1])
bound1 = Tuple{Float64, Float64}[(0.01, 1.0),(0.01, 1.0),(0.01, 1.0)]
result = bboptimize(obj;SearchRange = bound1, MaxSteps = 11e3)
Starting optimization with optimizer DiffEvoOpt{FitPopulation{Float64},RadiusLimitedSelector,BlackBoxOptim.AdaptiveDiffEvoRandBin{3},RandomBound{ContinuousRectSearchSpace}}
0.00 secs, 0 evals, 0 steps
0.50 secs, 550 evals, 501 steps, improv/step: 0.998 (last = 0.9980), fitness=5.973421797
1.00 secs, 1045 evals, 996 steps, improv/step: 0.999 (last = 1.0000), fitness=5.973421797
1.50 secs, 1558 evals, 1509 steps, improv/step: 0.999 (last = 1.0000), fitness=5.973421797
2.00 secs, 2059 evals, 2010 steps, improv/step: 1.000 (last = 1.0000), fitness=5.973421797
2.50 secs, 2535 evals, 2486 steps, improv/step: 1.000 (last = 1.0000), fitness=5.973421797
3.01 secs, 3067 evals, 3019 steps, improv/step: 0.999 (last = 0.9981), fitness=5.973421797
3.51 secs, 3592 evals, 3544 steps, improv/step: 0.999 (last = 1.0000), fitness=5.973421797
4.01 secs, 4079 evals, 4031 steps, improv/step: 1.000 (last = 1.0000), fitness=5.973421797
4.51 secs, 4602 evals, 4555 steps, improv/step: 0.999 (last = 0.9981), fitness=5.973421797
5.01 secs, 5205 evals, 5158 steps, improv/step: 0.999 (last = 1.0000), fitness=5.973421797
5.51 secs, 5779 evals, 5732 steps, improv/step: 0.999 (last = 1.0000), fitness=5.973421797
6.01 secs, 6295 evals, 6248 steps, improv/step: 1.000 (last = 1.0000), fitness=5.973421797
6.52 secs, 6779 evals, 6732 steps, improv/step: 1.000 (last = 1.0000), fitness=5.973421797
7.02 secs, 7293 evals, 7246 steps, improv/step: 1.000 (last = 1.0000), fitness=5.973421797
7.52 secs, 7814 evals, 7767 steps, improv/step: 1.000 (last = 1.0000), fitness=5.973421797
8.02 secs, 8317 evals, 8270 steps, improv/step: 1.000 (last = 1.0000), fitness=5.973421797
8.52 secs, 8898 evals, 8851 steps, improv/step: 1.000 (last = 1.0000), fitness=5.973421797
9.02 secs, 9408 evals, 9361 steps, improv/step: 1.000 (last = 1.0000), fitness=5.973421797
9.53 secs, 9898 evals, 9851 steps, improv/step: 1.000 (last = 1.0000), fitness=5.973421797
10.03 secs, 10436 evals, 10390 steps, improv/step: 1.000 (last = 0.9981), fitness=5.973421797
10.53 secs, 10961 evals, 10915 steps, improv/step: 1.000 (last = 1.0000), fitness=5.973421797

Optimization stopped after 11001 steps and 10.63 seconds
Termination reason: Max number of steps (11000) reached
Steps per second = 1034.51
Function evals per second = 1038.84
Improvements/step = 0.99973
Total function evaluations = 11047


Best candidate found: [0.717621, 0.9427, 0.0494182]

Fitness: 5.973421797

So does it make sense to use this feature when one has data in a one dimensional array?

If so, how may one use it properly?

Thank you

1 Like

I would recommend these days just using DiffEqFlux and writing a loss function that uses Distributions.jl inside of it. That’s what DiffEqParamEstim.jl is doing, and it’ll be a lot more flexible.

1 Like