Optim user-supplied gradient and hessian MLE

I’m moving my MLE class notes to incorporate Julia, so I’m exploring how to incorporate the analytical gradient and hessian into an optimization problem using Optim. I created a toy model to check the commands first.
However, although the code works well using only the objective function and ForwardDiff, it fails when using the gradient and/or hessian.
Does anyone know what might be going on?
Thanks!

Note: I already read about the potential (limited) gains obtained from incorporating the hessian, but still wanna try it.

using Optim, ForwardDiff
using LinearAlgebra, Random
Random.seed!(12);                       # Fix random seed generator for reproducibility

n    = 500                             # Number of observations
nvar = 2                               # Number of variables
β    = ones(nvar) * 3.71               # True coefficients
x    = [ones(n) randn(n, nvar - 1)]    # X matrix of explanatory variables plus constant
ε    = randn(n) * 2                    # Error variance
y    = x * β + ε;                      # Generate Data

function Loglik(N,Y,X,Γ)
    σ² = Γ[end,1]
    ep = (Y - X*Γ[1:(length(Γ)-1)])
    LL = -(N/2)*log(2π) - (N/2)*log(σ²) - (ep'*ep)/(2*σ²)
    return -LL
end
function ▽f(N,Y,X,▽,Γ)
    σ²  = Γ[end,1]
    k   = (length(Γ)-1)
    ep  = (Y - X*Γ[1:k])
    T   = promote_type(eltype(X),eltype(Γ))
    ▽  .= T(0)
    ▽[1:k] = (1/σ²)*(X'*ep)
    ▽[end] = - N/(2*σ²) + (ep'*ep)/(2*(√σ²)^4)
    return ▽
end
function ▽²f(N,Y,X,H,Γ)
    σ² = Γ[end,1]
    k   = (length(Γ)-1)
    ep  = (Y - X*Γ[1:k])
    H[1:k,1:k] = -(1/σ²)*(X'X);
    H[1:k,end] = -(1/(σ²)^2)*(X'ep);
    H[end,1:k] = -((1/(σ²)^2)*(X'ep))';
    H[end,end] = (N/(2*(σ²)^2)) - (1/(σ²)^3)*(ep'*ep)
    return H
end

f(α)    = Loglik(n,y,x,α)
g!(G,α) = ▽f(n,y,x,G,α)
h!(H,α) = ▽²f(n,y,x,H,α)
Guess   = ones(size(x)[2]+1);

# prelim check that grad and hess are ok:
g!(zeros(3,1),Guess)
ForwardDiff.gradient(f,Guess)

h!(zeros(3,3),Guess)
ForwardDiff.hessian(f,Guess)

# ---------------------------#
# Exploring Optim            #

# (1) Optimization using ForwardDiff: (working well)
Optim.minimizer(optimize(f, Guess)) # Same result as: (x'x)\(x'y)

# (2) Optimization using analytical gradient: (fail)
Optim.minimizer(optimize(f, g!, Guess)) # Approach 1

td   = TwiceDifferentiable( f, g!, Guess; autodiff=:forward)  # Approach 2
Optim.minimizer(optimize(td, Guess, LBFGS()))

# (2) Optimization using analytical hessian: (Issue: gives the same initial guess!)
Optim.minimizer(optimize(f, g!, h!, Guess, Newton()))

If I do

# prelim check that grad and hess are ok:
@show g!(zeros(3,1),Guess)
@show ForwardDiff.gradient(f,Guess)

@show h!(zeros(3,3),Guess)
@show ForwardDiff.hessian(f,Guess)

I see

g!(zeros(3, 1), Guess) = [1330.8310178125712; 1304.0590964682765; 4429.092663725574;;]
ForwardDiff.gradient(f, Guess) = [-1330.8310178125712, -1304.059096468278, -4429.092663725576]
h!(zeros(3, 3), Guess) = [-500.0 18.65451374962747 -1330.8310178125712; 18.65451374962747 -496.5116016197911 -1304.0590964682765; -1330.8310178125712 -1304.0590964682765 -9108.185327451149]
ForwardDiff.hessian(f, Guess) = [500.0 -18.654513749627476 1330.8310178125712; -18.654513749627476 496.5116016197912 1304.059096468278; 1330.8310178125712 1304.059096468278 9108.185327451152]

which looks slightly off? Furthermore Optim.minimizer(optimize(f, g!, Guess)) fails with

ERROR: LoadError: DomainError with -4428.092663725574:

which can be directly generated with

g!(zeros(3), [-1329.8310178125712, -1303.0590964682765, -4428.092663725574])
1 Like

Thanks for pointing that out.
The switch in the sign is related to the return of -LL, rather than LL, in the objective function. So, changing the sign in the returns of the functions for the analytical gradient and hessian may be enough to produce comparable results with those using ForwardDiff.
On the other hand, your last point does suggest issues related to the specification of the function g!() - or, specifically, in ▽f. I will check that! (perhaps using exp(sigma2) rather than sigma2).
In any case, I would like to explore the use of the hessian with Optim for the case of a normal distribution in MLE, so I would appreciate any advice.

To fix

DomainError with -4428.092663725574:
sqrt will only return a complex result if called with a complex argument.

I updated the functions to:

function Loglik(N,Y,X,Γ)
    σ² = exp(Γ[end])
    ep = (Y - X*Γ[1:(length(Γ)-1)])
    LL = -(N/2)*log(2π) - (N/2)*log(σ²) - (ep'*ep)/(2*σ²)
    return -LL
end
function ▽f(N,Y,X,▽,Γ)
    σ² = exp(Γ[end,1])
    k  = (length(Γ)-1)
    ep = (Y - X*Γ[1:k])
    ▽[1:k] = (1/σ²)*(X'*ep)
    ▽[end] = - N/2 + (ep'*ep)/(2*σ²)
    return -▽
end
function ▽²f(N,Y,X,H,Γ)
    σ²         = exp(Γ[end,1])
    k          = (length(Γ)-1)
    ep         = (Y - X*Γ[1:k])
    H[1:k,1:k] = -(1/σ²)*(X'X);
    H[1:k,end] = -(1/σ²)*(X'ep);
    H[end,1:k] = H[1:k,end]';
    H[end,end] = - (1/(2*σ²))*(ep'*ep)
    return -H
end

f(α)    = Loglik(n,y,x,α)
g!(G,α) = ▽f(n,y,x,G,α)
h!(H,α) = ▽²f(n,y,x,H,α)
Guess   = ones(size(x)[2]+1);

# prelim check that grad and hess are ok:
g!(zeros(3,1),Guess)
ForwardDiff.gradient(f,Guess)

h!(zeros(3,3),Guess)
ForwardDiff.hessian(f,Guess)

but if I do

Optim.minimizer(optimize(f, g!, Guess, LBFGS()))

I see the same initial guess

3-element Vector{Float64}:
 0.9995485513918926
 0.9995576333640301
 0.9986430247618775

likewise, if I do

Optim.minimizer(optimize(f, g!, h!, Guess, Newton()))

I see

3-element Vector{Float64}:
 0.9963506666680584
 0.996394580539055
 1.0007370702641363