2nd Order Newton's Method with Sparse Hessian

luke-kiernan · March 24, 2025, 5:02pm

I have a function that I wish to minimize via a 2nd order newton’s method with line search. I have an explicit gradient and Hessian. Moreover, the Hessian is sparse, and I care about performance (e.g. if possible, re-use the symbolic factorization). How can I do this? Is there a way to get the negative eigenvalues out of the failed sparse Cholesky factorization? Or a package that implements a sparse modified Cholesky factorization algorithm, that ensures the result is positive-definite?

2 options that I’ve looked into so far:

Add a scalar multiple of the identity: for this, I’d need to know the eigenvalues. SparseArrays.jl does not have a “diagonalize sparse symmetric matrix [with pivoting]” routine. I could try a sparse Cholesky factorization of H+cI, increasing c each time it fails. Could even binary search for the minimal c that works. But all those failed factorizations will hurt performance.
Use PositiveFactorizations.jl: this is what Optim.jl does. However, I suspect this package is aimed at dense matrices, not sparse ones.

Aside: the Apple Accelerate libSparse library does has LDLT factorizations. One could write a Julia wrapper like I was working on here, and get this feature into AppleAccelerate.jl…but that’s a different project.

stevengj · March 24, 2025, 5:27pm

Higham’s work seems relevant here, though I don’t know if there is a sparse implementation, but maybe you were already aware of that since you mentioned “sparse modified Cholesky”.

It seems like there should be a way to adaptively adjust this penalty by geometric factors as the algorithm proceeds, similar to Levenberg–Marquardt’s damping factor. That way you hopefully don’t have too many failed factorizations on a typical step. (But there are lots of “reasonable sounding” optimization methods that turn out not to converge, so it might be tricky to get this right.)

juliohm · March 24, 2025, 5:35pm

Just adding a link to Optim.jl in case you didn’t see it yet:

Never tried it with sparse Hessians though, but you should be able to define the Hessian in place to avoid allocations. These updates don’t need to know anything about array types, so I am assuming Optim.jl has potential to work out of the box for you.

dpo · March 24, 2025, 9:17pm

Try Trunk from GitHub - JuliaSmoothOptimizers/JSOSolvers.jl

luke-kiernan · March 25, 2025, 2:24am

Higham’s work seems relevant here, though I don’t know if there is a sparse implementation, but maybe you were already aware of that since you mentioned “sparse modified Cholesky”.

Haven’t heard of that guy in particular. I found the “modified Cholesky” strategy from reading Nocedal & Wright’s “Numerical Optimization” book.

I am assuming Optim.jl has potential to work out of the box for you

Yeah, I thought I’d start there. No luck so far getting it to work, though. I tried doing the following

x = Float64[]; ux = Float64[]
dfc = TwiceDifferentiableConstraints(lx, ux)
# also tried in-place versions: f!, g!, h!
df = TwiceDifferentiable(f, g, h, x0, f0, g0, h0)
res = optimize(df, dfc, x0, NewtonTrustRegion())

but I get a StackOverflowError on this function. The interface isn’t terribly well documented, so it’s probably something I’m doing wrong on my end that’s messing with their promote_object calls. If I instead do

res = optimize(f, g, h, x0, NewtonTrustRegion())

the package uses a dense matrix for h0.

Also, I’m not big on Optim.jl: my technical mentor had me implement a few methods from NLSolve.jl, to eliminate that dependency and get better performance. I’m wary of adding another large, generic dependency, if there’s something small and specific that does the job. But I’ll worry about that once I have things working.

Try Trunk from JSOSolvers.jl

Preliminary reading seems promising: there’s a hess_structure function, for sparse hessians. I’ll give that a try.

dpo · March 25, 2025, 7:21pm

To be clear, Trunk will not need the Hessian structure. It will only need Hessian-vector products. If your Hessian is sparse, the latter should be fast. Negative curvature directions, if any is detected, will be exploited. That’s the first thing I would try. If you would like to try a factorization-based method, Ipopt will do the right thing on your unconstrained problem. Both approaches are very similar and have similar properties.

luke-kiernan · April 15, 2025, 2:30am

Thanks for the suggestions! I ended up going with MUMPS.jl and adding a scalar multiple of the identity. What pushed me towards this package: I can do one symbolic factorization, then repeated numeric factorization/solve calls. The following should give the reader the general idea. (The full code is PR #148 in PowerFlows.jl.)

using MUMPS, MPI # may need MPIPreferences too.

@enum MUMPS_JOB begin
    INIT = -1
    CLEANUP = -2
    ANALYZE = 1
    FACTOR = 2
    SOLVE = 3
    SAVE = 7
    RESTORE = 8
    DELETE = -3
    FACTOR_CLEANUP = -4
end

function mumps_job!(mumps::Mumps, job::MUMPS_JOB)
    MUMPS.set_job!(mumps, Integer(job))
    MUMPS.invoke_mumps!(mumps)
    return
end

const β = 10.0^-3
"""Modify the Hessian to be positive definite, to guarantee that the solution 
to `H Δx = -∇f` is a direction of descent. Currently using algorithm 3.3 from 
Nocedal & Wright: add a multiple of the identity."""
function modify_hessian!(H::SparseMatrixCSC, mumps::Mumps)
    minDiagElem = minimum(H[i, i] for i in axes(H, 1))
    if minDiagElem > 0.0
        τ = 0.0
    else
        τ = -minDiagElem + β
        for i in axes(H, 1)
            H[i, i] += τ
        end
    end
    MUMPS.associate_matrix!(mumps, H)
    mumps_job!(mumps, FACTOR)
    while mumps.infog[12] > 0 # while matrix isn't positive definite.
        mumps_job!(mumps, FACTOR_CLEANUP)
        τ_old = τ
        τ = max(2 * τ, β)
        for i in axes(H, 1)
            H[i, i] += τ - τ_old # now try H + τ*I
        end
        MUMPS.associate_matrix!(mumps, H)
        mumps_job!(mumps, FACTOR)
    end
end

# setup code: run once
if !MPI.Initialized()
    MPI.Init()
end
icntl = deepcopy(MUMPS.default_icntl)
icntl[4] = 1 # report errors only
icntl[13] = 1 # due to our _modify_hessian! strategy,
# we need to know the exact number of negative pivots.
mumps = Mumps{Float64}(MUMPS.mumps_symmetric, icntl, MUMPS.default_cntl32)
MPI.add_finalize_hook!(() -> MUMPS.finalize(mumps))
# [initialize sparse structure of H here]
MUMPS.associate_matrix!(mumps, H)
mumps_job!(mumps, ANALYZE) # symbolic factorization.

# run each iteration.
# [compute H and grad at x here]
modify_hessian!(H, mumps)
MUMPS.associate_rhs!(mumps, grad)
mumps_job!(mumps, SOLVE)
MUMPS.mumps_solve!(Δx, mumps)
Δx .*= -1
# retain the symbolic factorization, so we can FACTOR and SOLVE next iteration.
mumps_job!(mumps, FACTOR_CLEANUP)
# [do a line search or otherwise decide on step size c]
x += c * Δx

It’d be lovely if one of the various packages integrated something like this: a 2nd order newton solver, reusing the symbolic factorization of the sparse hessian.

edit: well, it turns out the default MPI backend for windows is currently broken. Relevant issue

dpo · April 15, 2025, 3:47pm

That’s essentially what Ipopt will do if you give it an unconstrained problem, though it will not use MPI for the factorization. We have a regularized (quasi-)Newton solver coming up. It should be possible to include this kind of option.

Topic		Replies	Views
Using a sparse Hessian in Optim.jl Optimization (Mathematical)	21	206	October 29, 2024
From toy to production code with sparse matrix and sparse-direct factorizations Performance sparse , cholmod	17	1150	November 1, 2022
Newton optimization with Optim doing strange things Optimization (Mathematical)	36	2778	April 12, 2018
Best/simplest way to calculate sparse Hessian General Usage sparse , ad	4	1087	June 10, 2024
Sparse Cholesky of Gram Matrix (SuiteSparse?) Numerics question , linearalgebra	25	2598	April 11, 2020

2nd Order Newton's Method with Sparse Hessian

Related topics