Optim user-supplied gradient and hessian MLE

luischanci · February 21, 2022, 3:14pm

I’m moving my MLE class notes to incorporate Julia, so I’m exploring how to incorporate the analytical gradient and hessian into an optimization problem using Optim. I created a toy model to check the commands first.
However, although the code works well using only the objective function and ForwardDiff, it fails when using the gradient and/or hessian.
Does anyone know what might be going on?
Thanks!

Note: I already read about the potential (limited) gains obtained from incorporating the hessian, but still wanna try it.

using Optim, ForwardDiff
using LinearAlgebra, Random
Random.seed!(12);                       # Fix random seed generator for reproducibility

n    = 500                             # Number of observations
nvar = 2                               # Number of variables
β    = ones(nvar) * 3.71               # True coefficients
x    = [ones(n) randn(n, nvar - 1)]    # X matrix of explanatory variables plus constant
ε    = randn(n) * 2                    # Error variance
y    = x * β + ε;                      # Generate Data

function Loglik(N,Y,X,Γ)
    σ² = Γ[end,1]
    ep = (Y - X*Γ[1:(length(Γ)-1)])
    LL = -(N/2)*log(2π) - (N/2)*log(σ²) - (ep'*ep)/(2*σ²)
    return -LL
end
function ▽f(N,Y,X,▽,Γ)
    σ²  = Γ[end,1]
    k   = (length(Γ)-1)
    ep  = (Y - X*Γ[1:k])
    T   = promote_type(eltype(X),eltype(Γ))
    ▽  .= T(0)
    ▽[1:k] = (1/σ²)*(X'*ep)
    ▽[end] = - N/(2*σ²) + (ep'*ep)/(2*(√σ²)^4)
    return ▽
end
function ▽²f(N,Y,X,H,Γ)
    σ² = Γ[end,1]
    k   = (length(Γ)-1)
    ep  = (Y - X*Γ[1:k])
    H[1:k,1:k] = -(1/σ²)*(X'X);
    H[1:k,end] = -(1/(σ²)^2)*(X'ep);
    H[end,1:k] = -((1/(σ²)^2)*(X'ep))';
    H[end,end] = (N/(2*(σ²)^2)) - (1/(σ²)^3)*(ep'*ep)
    return H
end

f(α)    = Loglik(n,y,x,α)
g!(G,α) = ▽f(n,y,x,G,α)
h!(H,α) = ▽²f(n,y,x,H,α)
Guess   = ones(size(x)[2]+1);

# prelim check that grad and hess are ok:
g!(zeros(3,1),Guess)
ForwardDiff.gradient(f,Guess)

h!(zeros(3,3),Guess)
ForwardDiff.hessian(f,Guess)

# ---------------------------#
# Exploring Optim            #

# (1) Optimization using ForwardDiff: (working well)
Optim.minimizer(optimize(f, Guess)) # Same result as: (x'x)\(x'y)

# (2) Optimization using analytical gradient: (fail)
Optim.minimizer(optimize(f, g!, Guess)) # Approach 1

td   = TwiceDifferentiable( f, g!, Guess; autodiff=:forward)  # Approach 2
Optim.minimizer(optimize(td, Guess, LBFGS()))

# (2) Optimization using analytical hessian: (Issue: gives the same initial guess!)
Optim.minimizer(optimize(f, g!, h!, Guess, Newton()))

goerch · February 21, 2022, 7:51pm

If I do

# prelim check that grad and hess are ok:
@show g!(zeros(3,1),Guess)
@show ForwardDiff.gradient(f,Guess)

@show h!(zeros(3,3),Guess)
@show ForwardDiff.hessian(f,Guess)

I see

g!(zeros(3, 1), Guess) = [1330.8310178125712; 1304.0590964682765; 4429.092663725574;;]
ForwardDiff.gradient(f, Guess) = [-1330.8310178125712, -1304.059096468278, -4429.092663725576]
h!(zeros(3, 3), Guess) = [-500.0 18.65451374962747 -1330.8310178125712; 18.65451374962747 -496.5116016197911 -1304.0590964682765; -1330.8310178125712 -1304.0590964682765 -9108.185327451149]
ForwardDiff.hessian(f, Guess) = [500.0 -18.654513749627476 1330.8310178125712; -18.654513749627476 496.5116016197912 1304.059096468278; 1330.8310178125712 1304.059096468278 9108.185327451152]

which looks slightly off? Furthermore Optim.minimizer(optimize(f, g!, Guess)) fails with

ERROR: LoadError: DomainError with -4428.092663725574:

which can be directly generated with

g!(zeros(3), [-1329.8310178125712, -1303.0590964682765, -4428.092663725574])

luischanci · February 22, 2022, 2:34am

Thanks for pointing that out.
The switch in the sign is related to the return of -LL, rather than LL, in the objective function. So, changing the sign in the returns of the functions for the analytical gradient and hessian may be enough to produce comparable results with those using ForwardDiff.
On the other hand, your last point does suggest issues related to the specification of the function g!() - or, specifically, in ▽f. I will check that! (perhaps using exp(sigma2) rather than sigma2).
In any case, I would like to explore the use of the hessian with Optim for the case of a normal distribution in MLE, so I would appreciate any advice.

luischanci · February 22, 2022, 5:19pm

To fix

DomainError with -4428.092663725574:
sqrt will only return a complex result if called with a complex argument.

I updated the functions to:

function Loglik(N,Y,X,Γ)
    σ² = exp(Γ[end])
    ep = (Y - X*Γ[1:(length(Γ)-1)])
    LL = -(N/2)*log(2π) - (N/2)*log(σ²) - (ep'*ep)/(2*σ²)
    return -LL
end
function ▽f(N,Y,X,▽,Γ)
    σ² = exp(Γ[end,1])
    k  = (length(Γ)-1)
    ep = (Y - X*Γ[1:k])
    ▽[1:k] = (1/σ²)*(X'*ep)
    ▽[end] = - N/2 + (ep'*ep)/(2*σ²)
    return -▽
end
function ▽²f(N,Y,X,H,Γ)
    σ²         = exp(Γ[end,1])
    k          = (length(Γ)-1)
    ep         = (Y - X*Γ[1:k])
    H[1:k,1:k] = -(1/σ²)*(X'X);
    H[1:k,end] = -(1/σ²)*(X'ep);
    H[end,1:k] = H[1:k,end]';
    H[end,end] = - (1/(2*σ²))*(ep'*ep)
    return -H
end

f(α)    = Loglik(n,y,x,α)
g!(G,α) = ▽f(n,y,x,G,α)
h!(H,α) = ▽²f(n,y,x,H,α)
Guess   = ones(size(x)[2]+1);

# prelim check that grad and hess are ok:
g!(zeros(3,1),Guess)
ForwardDiff.gradient(f,Guess)

h!(zeros(3,3),Guess)
ForwardDiff.hessian(f,Guess)

but if I do

Optim.minimizer(optimize(f, g!, Guess, LBFGS()))

I see the same initial guess

3-element Vector{Float64}:
 0.9995485513918926
 0.9995576333640301
 0.9986430247618775

likewise, if I do

Optim.minimizer(optimize(f, g!, h!, Guess, Newton()))

I see

3-element Vector{Float64}:
 0.9963506666680584
 0.996394580539055
 1.0007370702641363

Topic		Replies	Views
'Optim' fails when supplying gradient and Hessian New to Julia optim	5	320	May 2, 2023
Optim, Forward Differentiation and OnceDifferentiable with MLE Optimization (Mathematical) optim	18	2673	November 28, 2020
Problem passing gradients into Optim General Usage optim , optimization	4	594	November 4, 2021
Optim.jl - get the hessian matrix from maximum likelihood estimation Optimization (Mathematical) question	6	1182	August 3, 2022
Inconsistency between standard errors from Hessian and Gradient in ForwardDiff/Optim Optimization (Mathematical) optim	6	2394	October 26, 2018

Optim user-supplied gradient and hessian MLE

Related topics