2nd Order Newton's Method with Sparse Hessian

I have a function that I wish to minimize via a 2nd order newton’s method with line search. I have an explicit gradient and Hessian. Moreover, the Hessian is sparse, and I care about performance (e.g. if possible, re-use the symbolic factorization). How can I do this? Is there a way to get the negative eigenvalues out of the failed sparse Cholesky factorization? Or a package that implements a sparse modified Cholesky factorization algorithm, that ensures the result is positive-definite?

2 options that I’ve looked into so far:

  1. Add a scalar multiple of the identity: for this, I’d need to know the eigenvalues. SparseArrays.jl does not have a “diagonalize sparse symmetric matrix [with pivoting]” routine. I could try a sparse Cholesky factorization of H+cI, increasing c each time it fails. Could even binary search for the minimal c that works. But all those failed factorizations will hurt performance.

  2. Use PositiveFactorizations.jl: this is what Optim.jl does. However, I suspect this package is aimed at dense matrices, not sparse ones.

Aside: the Apple Accelerate libSparse library does has LDLT factorizations. One could write a Julia wrapper like I was working on here, and get this feature into AppleAccelerate.jl…but that’s a different project.

Higham’s work seems relevant here, though I don’t know if there is a sparse implementation, but maybe you were already aware of that since you mentioned “sparse modified Cholesky”.

It seems like there should be a way to adaptively adjust this penalty by geometric factors as the algorithm proceeds, similar to Levenberg–Marquardt’s damping factor. That way you hopefully don’t have too many failed factorizations on a typical step. (But there are lots of “reasonable sounding” optimization methods that turn out not to converge, so it might be tricky to get this right.)

Just adding a link to Optim.jl in case you didn’t see it yet:

Never tried it with sparse Hessians though, but you should be able to define the Hessian in place to avoid allocations. These updates don’t need to know anything about array types, so I am assuming Optim.jl has potential to work out of the box for you.

Try Trunk from GitHub - JuliaSmoothOptimizers/JSOSolvers.jl

Higham’s work seems relevant here, though I don’t know if there is a sparse implementation, but maybe you were already aware of that since you mentioned “sparse modified Cholesky”.

Haven’t heard of that guy in particular. I found the “modified Cholesky” strategy from reading Nocedal & Wright’s “Numerical Optimization” book.

I am assuming Optim.jl has potential to work out of the box for you

Yeah, I thought I’d start there. No luck so far getting it to work, though. I tried doing the following

x = Float64[]; ux = Float64[]
dfc = TwiceDifferentiableConstraints(lx, ux)
# also tried in-place versions: f!, g!, h!
df = TwiceDifferentiable(f, g, h, x0, f0, g0, h0)
res = optimize(df, dfc, x0, NewtonTrustRegion())

but I get a StackOverflowError on this function. The interface isn’t terribly well documented, so it’s probably something I’m doing wrong on my end that’s messing with their promote_object calls. If I instead do

res = optimize(f, g, h, x0, NewtonTrustRegion())

the package uses a dense matrix for h0.

Also, I’m not big on Optim.jl: my technical mentor had me implement a few methods from NLSolve.jl, to eliminate that dependency and get better performance. I’m wary of adding another large, generic dependency, if there’s something small and specific that does the job. But I’ll worry about that once I have things working.

Try Trunk from JSOSolvers.jl

Preliminary reading seems promising: there’s a hess_structure function, for sparse hessians. I’ll give that a try.

To be clear, Trunk will not need the Hessian structure. It will only need Hessian-vector products. If your Hessian is sparse, the latter should be fast. Negative curvature directions, if any is detected, will be exploited. That’s the first thing I would try. If you would like to try a factorization-based method, Ipopt will do the right thing on your unconstrained problem. Both approaches are very similar and have similar properties.

1 Like