# Improve code that estimates parameters using max likelihood, optim, differential equations and automatic differentiation?

I’m trying to find parameters for a model that is in a differential equation form. I created three series of data, and I want the starting point of the differential equation for each series to be an additional parameter to be estimated.

I created a code that does this in Julia and I want to use automatic differentiation because I’ll needed for when I have more than 3 series (~100). I’ve done this in R (without autodiff) and I was hoping Julia would be faster by providing the gradient. Nevertheless, I still can do this faster in R, even without using the gradient. Why my code is not efficient?

This is the reproducible example:

``````using DifferentialEquations
using Optim
using ForwardDiff
using DataFrames
using Distributions
using SparseArrays
``````

#Data

``````AGE = [1.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0,
1.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.2,
1.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.5]
``````
``````HD  = [ 0.04,  1.51,  3.19,  6.55,  7.92,10.33, 12.66, 14.91, 15.41, 18.81,
0.04,  1.81,  3.39,  6.05,  7.02, 10.63, 12.86, 15.51, 15.91, 18.02,
0.04,  1.31,  3.09,  7.15,  8.2, 11.33, 12.36, 14.21, 15.01, 19.21]
``````

`df = DataFrame(Age = AGE, HDom = HD, Plot =repeat([1,2,3], inner=10))`

I create the following to find the initial value per series (called Plot) later:

``````function make_dummy(data::AbstractVector)
udata = unique(data)
inds  = map(x -> findfirst(==(x), udata), data)
sparse(1:length(data), inds, ones(Int, length(data)))
end
``````
``````nplot         = length(unique(df.Plot))
Mtx           = make_dummy(df.Plot)
``````

Then, the differential equation:

``````function f(h,p,t)
p[1]*h[1]*(1/t^(1+p[2]))
end
``````
``````function DEsol_l(Age,hini,parms)

time_int      = (Age[1],  Age[length(Age)])
prob          = ODEProblem(f, hini, time_int, parms)
sol           = solve(prob, saveat = Age)
return sol.u
end
``````

Then, the likelihood function

``````function loglik3l(θ)

sigma         = exp(θ[nplot+3])
df[!, "hini"] = Mtx*θ[1:nplot]
hpred         = combine(groupby(df, [:Plot]), df -> DEsol_l(df.Age,df.hini[1],θ[(nplot+1):(nplot+2)]))
residual      = hpred[:,2] .- df.HDom
result        = sum(logpdf.(Normal(0, sigma), residual))

return -result

end
``````

Then I create a vector of initial values from the data:

``````inits3l = combine(groupby(df, :Plot), :HDom => minimum)
inits3l = inits3l[!,2]
inits3l = [inits3l; 8; 0.9; -0.5]
``````

Use automatic differentiation for finding the gradient of the likelihood function:

``````od3l             = OnceDifferentiable(loglik3l, inits3l; autodiff= :forward);
``````

And run it:

``````result_ad3l = optimize(od3l, inits3l,Newton())
``````

I get good results here:

``````0.344979 seconds (1.20 M allocations: 93.444 MiB)
* Status: success

* Candidate solution
Final objective value:     2.606498e+01

* Found with
Algorithm:     Newton's Method

``````

But when I use more data (real data), with ~80 series it takes 1.3 hours! to run. And I can do the same in R in ~20 minutes.

What can be improved?

#----R Code below for comparison

``````AGE <- c(1.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0,
1.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.2,
1.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.5)

HD  <- c( 0.04,  1.51,  3.19,  6.55,  7.92,10.33, 12.66, 14.91, 15.41, 18.81,
0.04,  1.81,  3.39,  6.05,  7.02, 10.63, 12.86, 15.51, 15.91, 18.02,
0.04,  1.31,  3.09,  7.15,  8.2, 11.33, 12.36, 14.21, 15.01, 19.21)

df <- data.frame(Age = AGE, HDom = HD, Plot =rep(c(1,2,3), each=10))
setDT(df)

DummyMtxT<-function(TRT)
{ uplot <- unique(TRT)
do.call("rbind", lapply(TRT, function(x)
as.numeric(uplot==x)))
}

nplot <- length(unique(df\$Plot))
Mtx   <- DummyMtxT(df\$Plot)

fx1 <- function(t,h,p)
{
p1  <- p['p1']
p2  <- p['p2']

hdot = p1*h*(1/t^(1+p2))

return(list(H = hdot))
}

solveDE1 <- function(y, times, parms)
{
res <- rk4(y      = y,
times    = times,
p          = parms,
func     = fx1)
return(list(time = res[,1], Height = res[,2]))
}

log.lik1 <- function(params, data)
{

data\$hinits <- Mtx %*% params[1:ncol(Mtx)]

model <- data[,solveDE1(            y      = min(hinits),
times  = Age,
parms  = params),
by  = Plot]

value <- sum(dnorm(model[[3]], data\$HDom, exp(params["sds"]), log = TRUE ))

return(-value)

}

h0        <- df[,min(HDom), by =  Plot]
inits     <- c(h0 = h0[[2]],p1=8,p2=0.9,sds=-0.5)

mod1      <- with(df,
optimr(par      = inits,
fn       = log.lik1,
data     = df,
control  = list(maxit=10000)))

mod1
mod1\$par
``````

Can you post the R code? It’s often much easier to see what’s going wrong when you have the faster code to compare.

I added the code in R for the same example. I hadn’t done it before, but I noticed that I even get slightly different values for the parameters. I know I changed the optimization method. But I noticed that this is not the problem.

Hessian-based optimization is expensive on large systems, especially Newton. Did you try BFGS instead?

The R code is using a derivative-free method so its convergence is slow but doing 10000 iterations is pretty easy (since there are no derivatives to calculate). You may want to start with Nelder-Mead in Julia as well, and then go to BFGS.

I’ve tried several methods in both languages. I often run several of them to see if I get consistent estimates of the parameters. Doesn’t BFGS also use the hessian? I used Nelder-Mead for the real data, and that was when I took 1.3 hours.

If I use BFGS I get this error message:

“Warning: Automatic dt set the starting dt as NaN, causing instability. Exiting.”

It uses an approximate inverse hessian calculation, which isn’t as intensive as the hessian

If I use BFGS I get this error message:
“Warning: Automatic dt set the starting dt as NaN, causing instability. Exiting.”

That’s a warning, so it shouldn’t be an issue, it is just because the parameters there might cause some issues for the differential equation to solve. Does the optimization not proceed on after that?

No, it doesn’t continue.

I just created another question. I modified the code and it is more efficient in both languages, although I still have the same issue of faster convergence in R.

See:
Similar question