Fairly new to Julia, but I am loving it. This forum is a GREAT support but I can’t get my head around a problem I am experiencing right now. I am trying to perform a Monte Carlo for a variation of the standard multinomial logit model. I am using
Optim to perform non-linear optimization. The latter works with automatic differentiation, but does not work when supplying a user-written gradient and Hessian, despite many attempts. Let me reproduce a MWE.
julia> using LinearAlgebra, Optim, StatsFuns # Write the data Y = [[-20.0, -30.0, 9.0, -20.0, 19.0, 22.0], [7.0, -62.0, -16.0, -13.0, 27.0, -8.0], [58.0, 32.0, 6.0, -2.0, 26.0, 48.0], [53.0, 13.0, 41.0, 59.0, 6.0, 29.0], [-13.0, 8.0, -20.0, 2.0, -6.0, -7.0]] # Express the log-likelihood function function ℓ(β::Vector) return sum(logsumexp.(β .* Y) - β .* map(y -> y, Y)) end function ∇ℓ!(β::Vector, s) s = sum(dot.(softmax.(β .* Y), Y) - map(y -> y, Y)) end function ∇²ℓ!(β::Vector, h) h = sum(dot.(softmax.(β .* Y), map(y -> y .^ 2, Y)) - dot.(softmax.(β .* Y), Y) .^ 2) end
This sets up the data, log-likelihood function, gradient and Hessian. Plotting these over an appropriate interval around zero reveals that they are well behaved and that together, they can help identify a minimum. Codewise, I am trying to do everything by the book. However, if I try to perform optimization I get this.
julia> O = optimize(ℓ, [1.0], method = LBFGS(); autodiff=:forward, show_trace=true) Iter Function value Gradient norm 0 8.905359e+01 8.882755e+01 * time: 0.0 1 1.316398e+01 7.149078e+01 * time: 0.0010001659393310547 2 8.891946e+00 3.215174e+01 * time: 0.002000093460083008 3 8.492614e+00 8.647194e+00 * time: 0.002000093460083008 4 8.467783e+00 9.217495e-02 * time: 0.003000020980834961 5 8.467780e+00 1.786547e-06 * time: 0.003000020980834961 6 8.467780e+00 8.881784e-15 * time: 0.003999948501586914 * Status: success * Candidate solution Final objective value: 8.467780e+00 * Found with Algorithm: L-BFGS * Convergence measures |x - x'| = 1.12e-09 ≰ 0.0e+00 |x - x'|/|x'| = 4.76e-08 ≰ 0.0e+00 |f(x) - f(x')| = 3.55e-15 ≰ 0.0e+00 |f(x) - f(x')|/|f(x')| = 4.20e-16 ≰ 0.0e+00 |g(x)| = 8.88e-15 ≤ 1.0e-08 * Work counters Seconds run: 0 (vs limit Inf) Iterations: 6 f(x) calls: 18 ∇f(x) calls: 18 julia> Q = optimize(ℓ, ∇ℓ!, ∇²ℓ!, [1.0], method=LBFGS(), show_trace=true) Iter Function value Gradient norm 0 NaN NaN * time: 0.0 * Status: failure * Candidate solution Final objective value: NaN * Found with Algorithm: L-BFGS * Convergence measures |x - x'| = NaN ≰ 0.0e+00 |x - x'|/|x'| = NaN ≰ 0.0e+00 |f(x) - f(x')| = NaN ≰ 0.0e+00 |f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00 |g(x)| = NaN ≰ 1.0e-08 * Work counters Seconds run: 0 (vs limit Inf) Iterations: 0 f(x) calls: 1 ∇f(x) calls: 1
I have no idea re: what I am doing wrong and any help would be greatly appreciated.