Improve code that estimates parameters using max likelihood, optim, differential equations and automatic differentiation?

laurar1891 · July 27, 2022, 4:06pm

I’m trying to find parameters for a model that is in a differential equation form. I created three series of data, and I want the starting point of the differential equation for each series to be an additional parameter to be estimated.

I created a code that does this in Julia and I want to use automatic differentiation because I’ll needed for when I have more than 3 series (~100). I’ve done this in R (without autodiff) and I was hoping Julia would be faster by providing the gradient. Nevertheless, I still can do this faster in R, even without using the gradient. Why my code is not efficient?

This is the reproducible example:

using DifferentialEquations 
using Optim 
using ForwardDiff
using DataFrames
using Distributions
using SparseArrays

#Data

AGE = [1.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0,
       1.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.2,
       1.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.5]

HD  = [ 0.04,  1.51,  3.19,  6.55,  7.92,10.33, 12.66, 14.91, 15.41, 18.81,
    0.04,  1.81,  3.39,  6.05,  7.02, 10.63, 12.86, 15.51, 15.91, 18.02,
    0.04,  1.31,  3.09,  7.15,  8.2, 11.33, 12.36, 14.21, 15.01, 19.21]

df = DataFrame(Age = AGE, HDom = HD, Plot =repeat([1,2,3], inner=10))

I create the following to find the initial value per series (called Plot) later:

function make_dummy(data::AbstractVector)
    udata = unique(data)
    inds  = map(x -> findfirst(==(x), udata), data)
    sparse(1:length(data), inds, ones(Int, length(data)))
end

nplot         = length(unique(df.Plot))
Mtx           = make_dummy(df.Plot)

Then, the differential equation:

function f(h,p,t)
    p[1]*h[1]*(1/t^(1+p[2]))
end

function DEsol_l(Age,hini,parms)

    time_int      = (Age[1],  Age[length(Age)])    
    prob          = ODEProblem(f, hini, time_int, parms)
    sol           = solve(prob, saveat = Age)
    return sol.u
end

Then, the likelihood function

function loglik3l(θ)
    
    sigma         = exp(θ[nplot+3])
    df[!, "hini"] = Mtx*θ[1:nplot]  
    hpred         = combine(groupby(df, [:Plot]), df -> DEsol_l(df.Age,df.hini[1],θ[(nplot+1):(nplot+2)]))
    residual      = hpred[:,2] .- df.HDom
    result        = sum(logpdf.(Normal(0, sigma), residual))
    
    return -result

end

Then I create a vector of initial values from the data:

inits3l = combine(groupby(df, :Plot), :HDom => minimum)
inits3l = inits3l[!,2]
inits3l = [inits3l; 8; 0.9; -0.5]

Use automatic differentiation for finding the gradient of the likelihood function:

od3l             = OnceDifferentiable(loglik3l, inits3l; autodiff= :forward);

And run it:

result_ad3l = optimize(od3l, inits3l,Newton())

I get good results here:

0.344979 seconds (1.20 M allocations: 93.444 MiB)
 * Status: success

 * Candidate solution
    Final objective value:     2.606498e+01

 * Found with
    Algorithm:     Newton's Method

But when I use more data (real data), with ~80 series it takes 1.3 hours! to run. And I can do the same in R in ~20 minutes.

What can be improved?

#----R Code below for comparison

AGE <- c(1.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0,
       1.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.2,
       1.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.5)

HD  <- c( 0.04,  1.51,  3.19,  6.55,  7.92,10.33, 12.66, 14.91, 15.41, 18.81,
        0.04,  1.81,  3.39,  6.05,  7.02, 10.63, 12.86, 15.51, 15.91, 18.02,
        0.04,  1.31,  3.09,  7.15,  8.2, 11.33, 12.36, 14.21, 15.01, 19.21)

df <- data.frame(Age = AGE, HDom = HD, Plot =rep(c(1,2,3), each=10))
setDT(df)

DummyMtxT<-function(TRT)
{ uplot <- unique(TRT)
do.call("rbind", lapply(TRT, function(x)
  as.numeric(uplot==x)))
}

nplot <- length(unique(df$Plot))
Mtx   <- DummyMtxT(df$Plot)

fx1 <- function(t,h,p)
{
  p1  <- p['p1'] 
  p2  <- p['p2']

  hdot = p1*h*(1/t^(1+p2))
    
  return(list(H = hdot))
}

solveDE1 <- function(y, times, parms)
{
  res <- rk4(y      = y,
             times    = times,
             p          = parms,
             func     = fx1)
  return(list(time = res[,1], Height = res[,2]))
}


log.lik1 <- function(params, data)
{
  
  data$hinits <- Mtx %*% params[1:ncol(Mtx)]
  
  model <- data[,solveDE1(            y      = min(hinits),
                                      times  = Age,
                                      parms  = params),
                by  = Plot]
  
  value <- sum(dnorm(model[[3]], data$HDom, exp(params["sds"]), log = TRUE ))
  
  
  return(-value)
  
}


h0        <- df[,min(HDom), by =  Plot]
inits     <- c(h0 = h0[[2]],p1=8,p2=0.9,sds=-0.5)


mod1      <- with(df,
                  optimr(par      = inits,
                         fn       = log.lik1,
                         data     = df,
                         method   = "Nelder-Mead",
                         control  = list(maxit=10000)))

mod1
mod1$par

Oscar_Smith · July 27, 2022, 4:27pm

Can you post the R code? It’s often much easier to see what’s going wrong when you have the faster code to compare.

laurar1891 · July 27, 2022, 6:34pm

I added the code in R for the same example. I hadn’t done it before, but I noticed that I even get slightly different values for the parameters. I know I changed the optimization method. But I noticed that this is not the problem.

ChrisRackauckas · July 27, 2022, 7:17pm

Hessian-based optimization is expensive on large systems, especially Newton. Did you try BFGS instead?

The R code is using a derivative-free method so its convergence is slow but doing 10000 iterations is pretty easy (since there are no derivatives to calculate). You may want to start with Nelder-Mead in Julia as well, and then go to BFGS.

laurar1891 · July 27, 2022, 7:31pm

I’ve tried several methods in both languages. I often run several of them to see if I get consistent estimates of the parameters. Doesn’t BFGS also use the hessian? I used Nelder-Mead for the real data, and that was when I took 1.3 hours.

If I use BFGS I get this error message:

“Warning: Automatic dt set the starting dt as NaN, causing instability. Exiting.”

Vaibhavdixit02 · July 27, 2022, 7:56pm

It uses an approximate inverse hessian calculation, which isn’t as intensive as the hessian

If I use BFGS I get this error message:
“Warning: Automatic dt set the starting dt as NaN, causing instability. Exiting.”

That’s a warning, so it shouldn’t be an issue, it is just because the parameters there might cause some issues for the differential equation to solve. Does the optimization not proceed on after that?

laurar1891 · July 28, 2022, 3:21pm

No, it doesn’t continue.

laurar1891 · August 1, 2022, 7:10pm

I just created another question. I modified the code and it is more efficient in both languages, although I still have the same issue of faster convergence in R.

See:
Similar question

Topic		Replies	Views
Optim: What optimiser is best if your gradient computation is slow? Optimization (Mathematical) optim	13	3199	August 22, 2017
Optim, Forward Differentiation and OnceDifferentiable with MLE Optimization (Mathematical) optim	18	2796	November 28, 2020
DiffEqFlux / diffeq_adjoint: no method matching back!(::Float64) Machine Learning diffeq	3	1158	May 28, 2019
Parameter estimation of an ODE in Julia is slower than in R Performance optim , r , differentialequation , parameters	25	1785	August 7, 2022
Suggestions needed: speed up optimization Performance	31	2111	October 4, 2020

Improve code that estimates parameters using max likelihood, optim, differential equations and automatic differentiation?

Related topics