# [ANN] ProfileLikelihood.jl: Computing profile likelihoods

Hi all,

I have just released my package ProfileLikelihood.jl that will be fully registered in a few days. The aim of this package is to provide an easy interface for computing maximum likelihood estimates (MLEs), profile likelihoods, and confidence and prediction intervals. By building on top of the great interface from Optimization.jl, it is very easy to switch between different optimisers, use different types of automatic differentiation, to implement constraints.

I define what profile likelihoods, etc. are in the README of the package, but to quickly summarise: Suppose we have a likelihood function \mathcal L(\boldsymbol \theta) (basically a joint probability density function for some sample), and assume that we have some interest parameter \psi so that \boldsymbol \theta is decomposed as \boldsymbol \theta = (\psi, \boldsymbol \omega), where \boldsymbol \omega are the remaining ‘nuisance’ parameters. Currently I only support scalar \psi (edit: as of v0.2, bivariate profiles are now also supported). The profile likelihood of \psi is defined by \mathcal L_p(\psi_0) = \sup_{\boldsymbol \omega \in \Omega \mid \psi = \psi_0} \mathcal L(\psi, \boldsymbol \omega), i.e. the maximum of the likelihood when we fix \psi at a given value. This profile likelihood makes it easy to compute confidence intervals and approximate prediction intervals. Again, more detail is in the README, and a nice very recent resource is Simpson and Maclaren (2022).

To illustrate the package, I give one of the five examples from the docs below, where I estimate parameters for a logistic ODE, reproducing the first case study of Simpson and Maclaren (2022). The main idea is to first define the likelihood function so that a LikelihoodProblem can then be built. Computing the results is as easy as then using mle, profile, and then compute_prediction_intervals (if you have a prediction function you are interested in). Other examples for multiple linear regression, a simpler ODE \mathrm du/\mathrm dt = \lambda u, and estimating parameters for a PDE (the heat equation) are given in the README.

Suppose we have \mathrm du/\mathrm dt = \lambda u(1-u/K), u(0)=u_0, and our interest is in estimating (\lambda, K, u_0), fixing the standard deviation of the noise, \sigma, at \sigma=10 for this example. First, we need some data for this synthetic study:

using OrdinaryDiffEq, Random
λ = 0.01
K = 100.0
u₀ = 10.0
t = 0:100:1000
σ = 10.0
@inline function ode_fnc(u, p, t)
λ, K = p
du = λ * u * (1 - u / K)
return du
end
# Initial data is obtained by solving the ODE
tspan = extrema(t)
p = (λ, K)
prob = ODEProblem(ode_fnc, u₀, tspan, p)
sol = solve(prob, Rosenbrock23(), saveat=t)
Random.seed!(2828)
uᵒ = sol.u + σ * randn(length(t))


Next, a likelihood function is defined. I define the log-likelihood function, and use a normal likelihood since I added Gaussian noise to the data:

using ProfileLikelihood
function loglik_fnc2(θ, data, integrator)
λ, K, u₀ = θ
uᵒ, σ = data
integrator.p = λ
integrator.p = K
reinit!(integrator, u₀)
solve!(integrator)
return gaussian_loglikelihood(uᵒ, integrator.sol.u, σ, length(uᵒ))
end


When solving ODEs, it is assumed that the third argument is an integrator, coming from the integrator interface from DifferentialEquations.jl. In other cases, there are only the first two arguments (see the regression example in the README). You could re-define an ODEProblem many times, but this is slow. To now build the LikelihoodProblem from all this, and constraining 0 \leq \lambda \leq 0.05, 50 \leq K \leq 150, and 0 \leq u_0 \leq 50:

using Optimization
lb = [0.0, 50.0, 0.0] # λ, K, u₀
ub = [0.05, 150.0, 50.0]
θ₀ = [λ, K, u₀]
syms = [:λ, :K, :u₀]
prob = LikelihoodProblem(
loglik_fnc2, θ₀, ode_fnc, u₀, maximum(t);
syms=syms,
data=(uᵒ, σ),
ode_parameters=[1.0, 1.0], # temp values for [λ, K]
ode_kwargs=(verbose=false, saveat=t),
prob_kwargs=(lb=lb, ub=ub),
ode_alg=Rosenbrock23()
)


This function will construct the LikelihoodProblem, a function of two arguments that in this case is a closure that contains the integrator for the problem. From this point, it is very easy to find the MLEs and the profile likelihoods:

using OptimizationNLopt
sol = mle(prob, NLopt.LD_LBFGS)

LikelihoodSolution. retcode: Failure
Maximum likelihood: -38.99053694428977
Maximum likelihood estimates: 3-element Vector{Float64}
λ: 0.010438031266786045
K: 99.59921873132551
u₀: 8.098422110755225

prof = profile(prob, sol; alg=NLopt.LN_NELDERMEAD, parallel=false)

ProfileLikelihoodSolution. MLE retcode: Failure
Confidence intervals:
95.0% CI for λ: (0.006400992274213644, 0.01786032876226762)
95.0% CI for K: (90.81154862835605, 109.54214763511888)
95.0% CI for u₀: (1.5919805025139593, 19.070831536649305)


We now have estimates for our parameters, along with quantified uncertainty from the confidence intervals. Using the plot_profiles function, you can also visualise the profile likelihoods to assess e.g. smoothness and identifiability of the individual parameters:

using CairoMakie, LaTeXStrings
fig = plot_profiles(prof; latex_names=[L"\lambda", L"K", L"u_0"],
true_vals=[λ, K, u₀], # optional, gives the black lines below
fig_kwargs=(fontsize=30, resolution=(2109.644f0, 444.242f0)),
axis_kwargs=(width=600, height=300))


A last feature to show, again following Simpson and Maclaren (2022), is the construction of profile-wise profile likelihoods, which give profile likelihoods for some prediction function q(\boldsymbol \theta) conditional on an interest parameter \psi. In this case, suppose we are interested in quantifying the uncertainty in the function u(t) over many times. First, we define a prediction function that will give us our vector of interest:

function prediction_function(θ, data)
λ, K, u₀ = θ
t = data
prob = ODEProblem(ode_fnc, u₀, extrema(t), (λ, K))
sol = solve(prob, Rosenbrock23(), saveat=t)
return sol.u
end


Computing prediction intervals from this is simple using get_prediction_intervals:

t_many_pts = LinRange(extrema(t)..., 1000)
parameter_wise, union_intervals, all_curves, param_range =
get_prediction_intervals(prediction_function, prof,
t_many_pts; parallel = true)


The parameter_wise output gives prediction intervals based on varying one parameter at a time, and union_intervals gives the union of the prediction intervals at each time; see the docs for more detail. Plotting these prediction intervals together with the true curve in red + the curve from the MLE in blue, we obtain the figure below (code for making this plot is in the docs):

In (a), (b), and (c), the black curves show 95% profile-wise prediction intervals for the respective parameters. These views allow us to identify how each parameter contributes to the uncertainty in our prediction, e.g. the uncertainty from \lambda is largest for early time, which makes sense since only K matters for large time, and this is what we see in (b) where the interval becomes large only for long time. In (d) I show the union of all these intervals in the black curve, compared to the interval coming from the full likelihood (which is more accurate, but takes a significantly longer time to compute) in magenta. We see that we have fully enclosed the true solution, and this sensitivity analysis of the individual contributions to the total uncertainty is sensible.

All feedback welcome!
Thanks,
Daniel.

11 Likes

Thanks! I literally stumbled upon the mentioned paper two days ago and wanted to implement this This looks awesome and well done 1 Like

The package README.md is very impressive. The documentation is great.

3 Likes

An update as of v0.2.0: Bivariate profiles are now supported, with a complete example given in Example V: Lotka-Volterra ODE, GeneralLazyBufferCache, and computing bivarate profile likelihoods · ProfileLikelihood.jl, where I demonstrate how to compute bivariate profile likelihoods for a Lotka-Volterra ODE problem, how to support automatic differentiation, and how to compute prediction intervals from a bivariate profile (this is the second case study of Simpson and Maclaren (2022)).