I’ve been trying to estimate parameters (using max likelihood) of an ODE using the DifferentialEquations and Optim packages in Julia aiming to get a faster convergence compared to R. Problem is that I still can do it faster in R. The only advantage so far is that I get a lower value of likelihood with Julia, which is better. But I was really hoping to decrease computational time with Julia.
This is a reproducible example. I generated the data in R first and then used the exact same data in Julia. This data is pretty similar to the real data I’m really using. The code is pretty similar in both languages, so I don’t know what else I can change to make it faster in Julia. So far, I’m using the NelderMead method in both languages to make a fair comparison. I hope I can use gradient methods later and introduce automatic differentiation which can be done in Julia but not in R, but I don’t know how to do that either.
Running time for in R: 26.95 seconds
Running time in Julia: 34.76 seconds (103.59 M allocations: 22.318 GiB, 13.91% gc time, 0.04% compilation time)
Running time in Julia (with edit): 21.47seconds (64.50 M allocations: 22.087 GiB, 20.61% gc time, 0.02% compilation time)
data (if you want to download it): Link to drive with data
Julia code:
#Loading the packages
using DifferentialEquations
using Plots
using Optim
using DataFrames
using Distributions
using BenchmarkTools
using LinearAlgebra
using StatsBase
using CSV
data_s = DataFrame(CSV.File("Data/data_sim.csv"));
const nplot = length(unique(data_s.ID))
function f(u,p,t)
A1 = I(50) #This is not constant but let's assume it for now
du = p[1].*(A1*u).*(1/t.^(1+p[2]))
return du
end
function DEsol_l(Age,parms,n)
u0 = parms[1:n]
time_int = (minimum(Age),maximum(Age))
prob = ODEProblem(f, u0, time_int, parms[(n+1):(n+2)])
sol = solve(prob, saveat = unique(Age))
sol_v = vec(transpose(sol[:,:]))
return sol_v
end
function loglik1(time,var,ID,θ)
nplot = length(unique(ID))
sigma = exp(θ[nplot+3])
hpred = DEsol_l(time,θ,nplot)
residual = hpred .- var
result = sum(logpdf.(Normal(0, sigma), residual))
return -result
end
inits = combine(groupby(data_s, :ID), :HDOM => minimum)
inits = [inits[!,2]; 5; 0.5; 1 ];
@time result1 = optimize(b->loglik1(data_s.AGE,data_s.HDOM,data_s.ID,b), inits,NelderMead(),Optim.Options(iterations = 1000000))
R code
#Data generation
set.seed(123)
nseries <- 50
AGE <- rep(seq(5,30,3),nseries)
HDOM <- 15*exp(-3.5/(0.9*AGE^(0.9))) + abs(rnorm(length(AGE)))*AGE*0.05
ID <- rep(1:nseries,each=length(unique(AGE)))
data_s <- data.frame(AGE,HDOM,ID)
setDT(data_s)
plot(AGE,HDOM)
write.csv(data_s,"data_sim.csv")
#DE function
fx2 <- function(t,h,p)
{
p1 <- p['p1']
p2 <- p['p2']
hdot = p1*(A%*%h)*(1/t^(1+p2))
return(list(c(hdot)))
}
#DE Solver
solveDE2 <- function(times, parms,nplot)
{
res <- rk4(y = parms[1:nplot],
times = times,
p = parms,
func = fx2)
return(list(time = rep(res[,1],nplot), Height = c(res[,-1])))
}
#Loglik function
log.lik2 <- function(params, data)
{
model <- solveDE2(times = unique(data$AGE),
parms = params,
nplot = length(unique(data$ID)))
value <- sum(dnorm(model[[2]], data$HDOM, exp(params["sds"]), log = TRUE ))
return(-value)
}
A <- diag(nseries)
h0 <- data_s[,min(HDOM), by = ID]
inits <- c(h0 = h0[[2]],p1=5,p2=0.5,sds=1)
system.time(
mod2 <- with(data_s,
optimr(par = inits,
fn = log.lik2,
data = data_s,
method = "Nelder-Mead",
control = list(maxit=100000)))
)
mod2
Thanks!