Fairly new to Julia, but I am loving it. This forum is a GREAT support but I can’t get my head around a problem I am experiencing right now. I am trying to perform a Monte Carlo for a variation of the standard multinomial logit model. I am using Optim
to perform non-linear optimization. The latter works with automatic differentiation, but does not work when supplying a user-written gradient and Hessian, despite many attempts. Let me reproduce a MWE.
julia> using LinearAlgebra, Optim, StatsFuns
# Write the data
Y = [[-20.0, -30.0, 9.0, -20.0, 19.0, 22.0],
[7.0, -62.0, -16.0, -13.0, 27.0, -8.0],
[58.0, 32.0, 6.0, -2.0, 26.0, 48.0],
[53.0, 13.0, 41.0, 59.0, 6.0, 29.0],
[-13.0, 8.0, -20.0, 2.0, -6.0, -7.0]]
# Express the log-likelihood function
function ℓ(β::Vector)
return sum(logsumexp.(β .* Y) - β .* map(y -> y[1], Y))
end
function ∇ℓ!(β::Vector, s)
s[1] = sum(dot.(softmax.(β .* Y), Y) - map(y -> y[1], Y))
end
function ∇²ℓ!(β::Vector, h)
h[1] = sum(dot.(softmax.(β .* Y), map(y -> y .^ 2, Y)) - dot.(softmax.(β .* Y), Y) .^ 2)
end
This sets up the data, log-likelihood function, gradient and Hessian. Plotting these over an appropriate interval around zero reveals that they are well behaved and that together, they can help identify a minimum. Codewise, I am trying to do everything by the book. However, if I try to perform optimization I get this.
julia> O = optimize(ℓ, [1.0], method = LBFGS(); autodiff=:forward, show_trace=true)
Iter Function value Gradient norm
0 8.905359e+01 8.882755e+01
* time: 0.0
1 1.316398e+01 7.149078e+01
* time: 0.0010001659393310547
2 8.891946e+00 3.215174e+01
* time: 0.002000093460083008
3 8.492614e+00 8.647194e+00
* time: 0.002000093460083008
4 8.467783e+00 9.217495e-02
* time: 0.003000020980834961
5 8.467780e+00 1.786547e-06
* time: 0.003000020980834961
6 8.467780e+00 8.881784e-15
* time: 0.003999948501586914
* Status: success
* Candidate solution
Final objective value: 8.467780e+00
* Found with
Algorithm: L-BFGS
* Convergence measures
|x - x'| = 1.12e-09 ≰ 0.0e+00
|x - x'|/|x'| = 4.76e-08 ≰ 0.0e+00
|f(x) - f(x')| = 3.55e-15 ≰ 0.0e+00
|f(x) - f(x')|/|f(x')| = 4.20e-16 ≰ 0.0e+00
|g(x)| = 8.88e-15 ≤ 1.0e-08
* Work counters
Seconds run: 0 (vs limit Inf)
Iterations: 6
f(x) calls: 18
∇f(x) calls: 18
julia> Q = optimize(ℓ, ∇ℓ!, ∇²ℓ!, [1.0], method=LBFGS(), show_trace=true)
Iter Function value Gradient norm
0 NaN NaN
* time: 0.0
* Status: failure
* Candidate solution
Final objective value: NaN
* Found with
Algorithm: L-BFGS
* Convergence measures
|x - x'| = NaN ≰ 0.0e+00
|x - x'|/|x'| = NaN ≰ 0.0e+00
|f(x) - f(x')| = NaN ≰ 0.0e+00
|f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00
|g(x)| = NaN ≰ 1.0e-08
* Work counters
Seconds run: 0 (vs limit Inf)
Iterations: 0
f(x) calls: 1
∇f(x) calls: 1
I have no idea re: what I am doing wrong and any help would be greatly appreciated.