I’m having some issues with optimizing an objective with Newton from Optim. I started a brief discussion on slack which has led me to isolate the problem:
Okay I am getting closer to the issue. So consider gradient and hessian:
st0=[160.2, -54.8, -210.8, -0.8, 106.2, -0.0];
st1=[137.6 -34.4 -34.4 -34.4 -34.4 -1.0; -34.4 137.6 -34.4 -34.4 -34.4 -1.0; -34.4 -34.4 137.6 -34.4 -34.4 -1.0; -34.4 -34.4 -34.4 137.6 -34.4 -1.0; -34.4 -34.4 -34.4 -34.4 137.6 -1.0; -1.0 -1.0 -1.0 -1.0 -1.0 0.0]
the newton step I am expecting to happen is
Main> st1\st0
6-element Array{Float64,1}:
0.931395
-0.318605
-1.22558
-0.00465116
0.617442
4.90803e-14
which (if you do two iterations of this) you get close to my optimum. The Optim package is using PositiveFactorizations approach (because st1 is not PosDef) and does (basically) this:
Main> F=cholfact(Positive, LowerTriangular(st1))
Base.LinAlg.Cholesky{Float64,Array{Float64,2}} with factor:
[11.7303 0.0 … 0.0 0.0; -2.93258 11.3578 … 0.0 0.0; … ; -2.93258 -3.78594 … 1.0 0.0; -0.0852493 -0.110056 … 0.0 0.340997]
Main> F\st0
6-element Array{Float64,1}:
-0.457849
-1.70785
-2.61483
-1.3939
-4.12115e-13
-26.55
(which corresponds to the values I see in the trace of the Newton output). This does not converge to the right solution. What is the problem here?
These two approaches boil down to small differences in the hessian that is used in the Newton step:
Main> F.factors*ctranspose(F.factors)
6×6 Array{Float64,2}:
137.6 -34.4 -34.4 -34.4 -34.4 -1.0
-34.4 137.6 -34.4 -34.4 -34.4 -1.0
-34.4 -34.4 137.6 -34.4 -34.4 -1.0
-34.4 -34.4 -34.4 137.6 -34.4 -1.0
-34.4 -34.4 -34.4 -34.4 138.6 4.0
-1.0 -1.0 -1.0 -1.0 4.0 0.232558
Main> st1
6×6 Array{Float64,2}:
137.6 -34.4 -34.4 -34.4 -34.4 -1.0
-34.4 137.6 -34.4 -34.4 -34.4 -1.0
-34.4 -34.4 137.6 -34.4 -34.4 -1.0
-34.4 -34.4 -34.4 137.6 -34.4 -1.0
-34.4 -34.4 -34.4 -34.4 137.6 -1.0
-1.0 -1.0 -1.0 -1.0 -1.0 0.0
Below is 1 iteration using Optim.Newton (showing values that correspond to the PositiveFactorizations)
Main> res=Optim.optimize(logLC, g!, h!, zeros(m+1), show_trace=true, extended_trace=true, method=Optim.Newton(linesearch=LineSearches.Static(alpha=1.0)), iterations=1)
Iter Function value Gradient norm
0 1.348190e+03 2.108000e+02
* Current step size: 1.0
* g(x): [160.2, -54.8, -210.8, -0.8, 106.2, -0.0]
* h(x): [137.6 -34.4 -34.4 -34.4 -34.4 -1.0; -34.4 137.6 -34.4 -34.4 -34.4 -1.0; -34.4 -34.4 137.6 -34.4 -34.4 -1.0; -34.4 -34.4 -34.4 137.6 -34.4 -1.0; -34.4 -34.4 -34.4 -34.4 137.6 -1.0; -1.0 -1.0 -1.0 -1.0 -1.0 0.0]
* x: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
1 9.554691e+02 7.972837e+01
* Current step size: 1.0
* g(x): [10.9235, -30.7894, -2.10531, -31.0505, -79.7284, -6.17442]
* h(x): [79.4839 -23.51 -23.3 -21.8481 -10.8258 -1.0; -23.51 135.839 -54.0743 -41.7406 -16.5137 -1.0; -23.3 -54.0743 138.641 -45.3491 -15.9179 -1.0; -21.8481 -41.7406 -45.3491 124.487 -15.5493 -1.0; -10.8258 -16.5137 -15.9179 -15.5493 58.8067 -1.0; -1.0 -1.0 -1.0 -1.0 -1.0 0.0]
* x: [0.457849, 1.70785, 2.61483, 1.3939, 4.12115e-13, 26.55]
Results of Optimization Algorithm
* Algorithm: Newton's Method
* Starting Point: [0.0,0.0,0.0,0.0,0.0,0.0]
* Minimizer: [0.45784883720971925,1.7078488372097196, ...]
* Minimum: 9.554691e+02
* Iterations: 1
* Convergence: false
* |x - x'| ≤ 1.0e-32: false
|x - x'| = 2.66e+01
* |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
|f(x) - f(x')| = 4.11e-01 |f(x)|
* |g(x)| ≤ 1.0e-08: false
|g(x)| = 7.97e+01
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: true
* Objective Calls: 2
* Gradient Calls: 2
* Hessian Calls: 2