I’m moving my MLE class notes to incorporate Julia, so I’m exploring how to incorporate the analytical gradient and hessian into an optimization problem using Optim. I created a toy model to check the commands first.
However, although the code works well using only the objective function and ForwardDiff, it fails when using the gradient and/or hessian.
Does anyone know what might be going on?
Thanks!
Note: I already read about the potential (limited) gains obtained from incorporating the hessian, but still wanna try it.
using Optim, ForwardDiff
using LinearAlgebra, Random
Random.seed!(12); # Fix random seed generator for reproducibility
n = 500 # Number of observations
nvar = 2 # Number of variables
β = ones(nvar) * 3.71 # True coefficients
x = [ones(n) randn(n, nvar - 1)] # X matrix of explanatory variables plus constant
ε = randn(n) * 2 # Error variance
y = x * β + ε; # Generate Data
function Loglik(N,Y,X,Γ)
σ² = Γ[end,1]
ep = (Y - X*Γ[1:(length(Γ)-1)])
LL = -(N/2)*log(2π) - (N/2)*log(σ²) - (ep'*ep)/(2*σ²)
return -LL
end
function ▽f(N,Y,X,▽,Γ)
σ² = Γ[end,1]
k = (length(Γ)-1)
ep = (Y - X*Γ[1:k])
T = promote_type(eltype(X),eltype(Γ))
▽ .= T(0)
▽[1:k] = (1/σ²)*(X'*ep)
▽[end] = - N/(2*σ²) + (ep'*ep)/(2*(√σ²)^4)
return ▽
end
function ▽²f(N,Y,X,H,Γ)
σ² = Γ[end,1]
k = (length(Γ)-1)
ep = (Y - X*Γ[1:k])
H[1:k,1:k] = -(1/σ²)*(X'X);
H[1:k,end] = -(1/(σ²)^2)*(X'ep);
H[end,1:k] = -((1/(σ²)^2)*(X'ep))';
H[end,end] = (N/(2*(σ²)^2)) - (1/(σ²)^3)*(ep'*ep)
return H
end
f(α) = Loglik(n,y,x,α)
g!(G,α) = ▽f(n,y,x,G,α)
h!(H,α) = ▽²f(n,y,x,H,α)
Guess = ones(size(x)[2]+1);
# prelim check that grad and hess are ok:
g!(zeros(3,1),Guess)
ForwardDiff.gradient(f,Guess)
h!(zeros(3,3),Guess)
ForwardDiff.hessian(f,Guess)
# ---------------------------#
# Exploring Optim #
# (1) Optimization using ForwardDiff: (working well)
Optim.minimizer(optimize(f, Guess)) # Same result as: (x'x)\(x'y)
# (2) Optimization using analytical gradient: (fail)
Optim.minimizer(optimize(f, g!, Guess)) # Approach 1
td = TwiceDifferentiable( f, g!, Guess; autodiff=:forward) # Approach 2
Optim.minimizer(optimize(td, Guess, LBFGS()))
# (2) Optimization using analytical hessian: (Issue: gives the same initial guess!)
Optim.minimizer(optimize(f, g!, h!, Guess, Newton()))