Does Optim.jl calculate confidence interval?

Hello community,

Does anyone know how to get confidence intervals from the result of Optim.jl?

Thanks in advance!

In statistics, extremum estimators minimize or maximize functions, and Optim will do that. To get confidence intervals for the estimators, you need to use theory to find the (usually, asymptotic) distribution of the estimator, and then you can estimate the covariance of that asymptotic distribution to get estimated standard errors, which can be used to form confidence intervals. So, Optim gets you the estimator, but getting confidence intervals requires more steps, and the specific steps depend on what is the criterion function that you optimize, plus what are some assumptions that you might want to make. There’s no simple general answer to this question.

5 Likes

Depending on what you want to archive and how computationally expensive your computations are, bootstrapping might be an option. But as @mcreel noted, you really need to provide some context for your question.

If you’re doing (unregularized) maximum likelihood estimation, you can use the Fisher Information matrix.
See lines 57-65 of this code, and line 61 in particular, which computes the covariance matrix of the estimated parameters of a multinomial regression. The standard errors, p-values etc can be derived from here.
You may also be interested in the src/diagnostics.jl code of the same package, which is not specific to multinomial regression, and which provides an interface for constructing a regression table and other similar diagnostics for statistical models.

Thank you @mcreel! also thank you @marcpabst @jocklawrie

Here is my problem:
I got some experimental results which is the step response of the concentration of O2 at a reactor outlet, yO2_exp. I simulated the reactor with a model which predicts the O2 concentration according to the parameter I give, yO2_simu(para) (1D plug flow + axial dispersion + reactions, not important). Then I tried to minimize the function:
f(para) = sum(abs2, yO2_exp - yO2_simu(para)) .
So basically this is a lsq problem.

Tutorial · LsqFit.jl looks like it does just what you need. Note that there are some assumptions involved in getting estimated standard errors: " Assume the errors in each sample are independent, normal distributed with zero mean and same variance, i.e. ϵ∼N(0,σ2I)ϵ∼N(0,σ2I), the covariance matrix from the linear approximation is therefore:"

If your data is assumed to be measured without error, and the differences between the data and the model are due to model misspecification, then that assumption above would not be correct. If the model is assumed to be correct, and the errors are due to measurement error, then the assumption could be reasonable.

Hi Creel, I also noticed the LsqFit.jl can calculate the confidence intervals but was wondering why Optim.jl doesn’t. Your first reply gave me the answer (although I don’t fully understand :smiley: )
Thank you again!

1 Like

Thank you @mcreel for this clarification.

I am also trying to get standard errors for my problem where I use NLopt (derivative free method) to maximise a log-likelihood function. The estimator given by the optimization is the parameter of an exponentiel distribution. On top of that: each observation used for the MLE corresponds to the maximum of 2 objects that are themselves functions of the parameter of interest. I don’t think it is possible to derive the Hessian analytically in my case. Literature on which my thesis is based use bootstrapping at this stage, without more details.

Would someone have a tip on how to get an estimate for standard errors?
Can the optimization algorithm provide me with an approximate empirical Hessian from which I could recover the Fisher information and therefore the asymptotic variance?

thank you

EDIT: NLopt-based or Optim-based solutions are all welcome

I don’t think Optim (which isn’t exclusively statistics-focused) should be focused on generating CI’s, but I documented one way to do what you’re suggesting here: julia_tutorials/Statistics in Julia - Maximum Likelihood Estimation.ipynb at master · johnmyleswhite/julia_tutorials · GitHub

There is some question of having Optim make it easier to get the Hessian back, but this is a bit of a forced hack since many algorithms never use Hessians.

3 Likes

Similarly to the previous answer, an example of code to get standard errors for ML estimates computed from numeric optimization is here: Econometrics/15-MaximumLikelihood.jl at main · mcreel/Econometrics · GitHub

In particular, once you have the ML estimates, you can get estimates of the Hessian and the information matrix using automatic differentiation, and those can be used to compute estimators of standard errors:

## Now get the t-stats, from Ihat and J hat.
using ForwardDiff
sc =  ForwardDiff.jacobian(logℒᵢ, θhat) # get the score contributions
Ihat = mean(sc.^2.)
Jhat = -ForwardDiff.hessian(s, θhat)
# when you set n large (above), you should see that Ihat ⩬ -Jhat  
# also, note that Jhat = 1/θhat², which is a result we can get analytically

# three forms of estimated variance
V1 = inv(Ihat)/n
se1 = sqrt(V1)
V2 = inv(-Jhat)/n
se2 = sqrt(V2)
V3 =  inv(Jhat)*Ihat*inv(Jhat)/n
se3 = sqrt(V3)
[se1 se2 se3]   # note that the estimators are a little different from one another
                # the last one, sandwich, is what's reported in mleresults.
##

Thank you both for the tutorials, this indeed looks like what I am trying to do. Now my problem is that I have an error when I call automatic differentiation at the points found by Optim ForwardDiff.gradient(fun, ML_estimates), typically the error is:
MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_grid_cost), Int64}, Float64, 1}).

I understand trying to circumvent that with .value is pointless as I will drop the dual infos and get 0s as gradient. I explored Limitations of ForwardDiff · ForwardDiff but I am still stucked.

This cryptic sentence translates to this in equation:

So the error is thrown at the computation of the 2 V objects, in particular theta determines the grid points in the calculation of V: the grid adjusts such that each points has a 1% probability slice (fixed grid solves nothing since probability slice would have to adjust). Do you have a clue ?

If I don’t have any hint to move forward, I’ll consider V^A and V^B as fixed given the ML estimates and compute the gradient only for the random variable X. It’s not perfect but ForwardDiff should do that.

My guess from your error is that you have excessively typed the inputs to your functions, but it’s hard to tell without seeing your code.

As long as the two Vs are continuous functions of theta (?), you could use finite differences to compute derivatives. The Calculus package has Calculus.jacobian and Calculus.hessian.

I got this dual number error when solving some differential equations. The solution is add
autodiff=:false
at a suitable place.

Yes, either that or some intermediate arrays are initialized as zeros(T,N) and then filled with results of the computation. Perhaps easier to use FiniteDiff.jl instead of ForwardDiff.jl.

Here I provide a piece of code that throws the error. The code computes the average of all percentile from an exponential distribution where I use lambda instead of theta (it’s a bit useless but at least it shows the problem).

using ForwardDiff, Statistics
function quantile_exponential!(Q, i, λ, p)
    if p < 0 || p > 1
        error("$p is not a probability")
    else
        Q[i] = - log(1-p)/λ
    end
end
function fun_percentile!(borns_c, λ, nc)
    quantile_exponential!(borns_c, nc+1, λ[1], 0.999)
    for i in 2:nc 
        quantile_exponential!(borns_c, i, λ[1], (i-1)/nc)
    end
end
function average_percentile(λ, nc)
    borns_c = Array{Float64}(undef, nc+1)
    fun_percentile!(borns_c, λ, nc)
    mean(borns_c)
end
make_clos(nc) = λ -> average_percentile(λ, nc)
fun = make_clos(100)

julia> ForwardDiff.derivative(fun, 1.)
ERROR: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#7#8"{Int64}, Float64}, Float64, 1})

In this example it is !

it was indeed the case in my original case, I thought that using mutating functions as I did in the example above would solve this problem but it did not.

EDIT: after @Paul_Soderlind’s remark

function average_percentile(λ, nc)
    borns_c = zeros(typeof(λ),nc+1) #replace here
    fun_percentile!(borns_c, λ, nc)
    mean(borns_c)
end
julia> ForwardDiff.derivative(fun, 1.)
-1.026588102200275

Almost sure this life is your problem since you’re trying to downcast dual numbers to Float64. Don’t have time to fully debug, but others may solve this for you. But this is 99.9% likely your problem.

1 Like

something like this perhaps: zeros(typeof(λ),nc+1), assuming λ is a number (scalar). Otherwhise eltype()

1 Like