Quasi-newton nonlinear solver for sparse jacobian

Hi community,

I was wondering if there is a sparse-optimized quasi-newton solver in the Julia ecosystem. This question was motivated by the undocumented broyden solver in NLsolve.jl. This solver requires the inverse of the jacobian, which is clearly not optimal when it’s sparse.

I wouldn’t mind implementing my own if there isn’t, but could anyone point me to the right algorithm either in math or from other languages?


1 Like

BFGS? Nocedal, Wright book is a great reference. And, we have C. T. Kelley in this forum (@ctkelley). I rather think he may have some good advice. Btw : isn’t this already in Optim.jl?


In theory, you can use Percival.jl which is matrix-free so you just need the J' * v and sometimes J * v operators. The tricky part is to define a model subtype of AbstractNLPModel from GitHub - JuliaSmoothOptimizers/NLPModels.jl: Data Structures for Optimization Models. Also if you wait a little, you should be able to use ChainRules to specify your J' * v operator which will be picked up by Zygote and used by Percival. But this requires a PR (Allow use of Zygote or ReverseDiff by mohamed82008 · Pull Request #4 · JuliaSmoothOptimizers/ADNLPModels.jl · GitHub) to be merged (hopefully soon) to add Zygote support to ADNLPModels.jl.


IMHOP the easy thing to do is restarted BFGS. That’s different from limited memory and does need a good preconditioner to perform best. Nocecdal-Wright explains the issue well, as do I in (shameless sales pitch)

author=“C. T. Kelley”,
title="{Iterative Methods for Optimization}",
series=“Frontiers in Applied Mathematics”,

The bad news is that it is not trivial to implement this on your own. My matlab codes are pretty decent, but somewhat (aka extremely ) out of date (vintage 1999). You really want something with an inverse Hessain-vector product to start. Then BFGS (either limited memory or restarted) or any other quasi-Newton method (take SR-1, please) can update the inverse Hessian with the identity as a starting point and the Sherman-Morrison formula.

I’m busy getting my nonlinear solvers to work in Julia and am not likely to live long enough to even think about optimization. Much of the stuff you can get to in Julia (eg IPOPT) is really good.

I will have Broyden in my nonlinear solver package, but Broyden is for nonlinear equations and not for optimization. Broyden does not know (1) that Hessians are symmetric and (2) that a Hessian must be SPD for a line search to have a prayer of working. Do not use Broyden for optimization, use BFGS, DFP, or (at your own risk) SR-1 instead. There are perfectly OK Broyden codes for nonlinear equations out there (mine, the Argonne code by Jorge More’, …) that use Broyden’s method with a line search for equations, but they are not supported by any theory that I know about and are not designed for optimization.

For unconstrained problems (which is all I really understand) Steihaugs’s trust region CG approach ( sparse Hessian + really good preconditioner) is my personal favorite for artistic reasons. But, BFGS may do better and you should really do the experiment on the application you care about (ie not some idiotic suite of artificial test problems).

@PetrKryslUCSD said my name in his post. I assume he wanted an opinionated opinion. I think I’ve delivered.


Thank you all for suggestions!

@ctkelley @PetrKryslUCSD Probably I didn’t state it clearly, but I’m looking for a nonlinear solver instead of optimization (is there another domain specific for nonlinear solver?). That’s why I’m considering broyden and alike, and didn’t pay too much attention to Optim.jl.

In this case is BFGS still a good option?

Thanks again!

There is nothing like an opinionated statement from a true expert. Thanks!


To a large degree, solving a system of nonlinear equations may be equivalent to an optimization problem. Especially when unconstrained. BFGS is well known in computational mechanics where it is used to solve systems of nonlinear algebraic equations.

1 Like

BFGS is not a nonlinear solver method. It seeks to maintain a SPD Hessian for optimization. The thing to remember is that optimization and nonlinear solvers are not the same thing. Using an algorithm desinged for one of them to solve the other is a bad, bad, bad idea.

Broyden looks for a general low-rank change to an approximate Jacobian and is designed for nonlinear solver applications. Broyden needs a good preconditioner to perform well and has fallen out of favor in the last few decades. Newton-Krylov methods do as well and are more common in modern solvers. The central issue is that quasi-Newton methods, when implemented correctly, consume storage during the nonlinear iteration. Newton-GMRES, for example, consumes storage in the linear iteration and that is far easier to manage and makes nonlinear convergence a simpler problem.

My package (SIAMFANLEquations.jl) has Newton-GMRES working, but the documentation and examples are not really ready for prime time. I’ll have something decent in a month and everything in the master branch works correctly. The latest released version (.0.3.0) works, but the memory allocations are better by far in the master branch.


Thanks a lot! That’s tons of information and really helpful. I’ll check your package out

1 Like

For people who come to this post later, below is a very relevant discussion

Thanks @briochemc for the reference.


Actually, it may be a good idea to outline what it is you are looking at. Might make a change in what people recommend.

1 Like

I think NLSolve.jl might already have what you are looking for. Indeed, you can pass the linear solver to the solver as shown in the test.

So you can pass for example linsolve = (x, A, b)-> x .= A \ b which works for sparse matrices. You could also use IterativeSolvers,jl, KrylovKit.jl for linsolve.

The thing I dont understand in your question is whether:

  • you dont know how to pass the sparse jacobian
  • you dont know how to pass the sparse linear solver

In any case, you can check the tutorials of my package on bifurcations: you will find many examples related to your question with preconditioned linear solver for Krylov newton (on gpu…)


Does anybody know of a flowchart for making choices between nonlinear/optimization solvers? I remember seeing one for linear solvers (where the situation is of course simpler).

My personal flowchart for optimization is: is the problem small with available hessian? If yes, use (a globalization of) Newton. If not, use LBFGS. If storage is really an issue, use CG.

For nonlinear solvers, it’s: is the problem is small with available Jacobian? If yes, use (a globalization of) Newton. If not, is there a natural fixed-point iteration for the problem with good convergence properties (at least in some cases)? If yes, use Anderson. If not, use Broyden, unless storage is really an issue, in which case use Newton-Krylov.


I am not aware of one, the choices are quite complex with a lot of trade-offs. IMO it is worth investing into understanding various methods at least at an intuitive level, and trying out different ones for a particular problem.


You assume that people are actually interested in the various numerical methods, how they work, how to get optimal performance out of them, etc. When I use a compiler to compile my code, I am not particularly interested in what a tokenizer is, I just want the compiler to do its job. Ideally everybody should know everything, but that’s not quite how it works in practice. The choices are not that complex, these are pretty mature fields by now: there are a few groups of methods, and inside these groups variants that differ from each other in relatively small ways. The question is what group of method to choose from, and that can be done (approximately) with pretty simple questions.


That’s one application where you know the Jacobian is SPD. In those cases BFGS makes sense for equations. If your Jacobian has some structure you can exploit (symmetry, SPD, sparsity pattern) there are quasi-Newton methods out there that will maintain that structure. The ones that preserve sparsity sound good, but have not, to my knowledge, worked that well in practice.

The way many modern codes use quasi-Newton methods is to update an approximation to the inverse of the Jacobian/Hessian. Methods like Broyden and BFGS can do that with ease. The sparsity-preserving methods cannot.


No, I don’t. I am afraid you misunderstood something.

If you use numerical methods, they are a tool. You don’t have to be interested in them per se, but it is useful to know a bit about the intuition, the same way it is useful to be able to fix punctures if you bike, without becoming a bicycle mechanic.

In particular, I am suggesting reading through a few chapters of Nocedal & Wright, or something similar. Eg it is good to know about the details between quasi-Newton and trust region methods, without getting bogged down in line search algorithms.


Thanks for the suggestion. I indeed didn’t realize I can pass linsolver to it.

As @PetrKryslUCSD suggested probably it’s useful to explain where I come from:

I’m actually solving a PDE (from a 2+ dimensional HJB) using the implicit Euler method, so the jacobian is sparse structured. The algorithm is very close to @matthieu’ s package EconPDEs.jl where he uses a full newton method to solve each time step.

In some of my applications, it can be relatively expensive to evaluate the jacobian at each iteration (even took advantage of sparsity), which motivates me to look for a quasi-newton method like Broyden, but optimized for sparse jacobian.

After @ctkelley’s post, I realized the major bottleneck is indeed the linear solver part, at least for my current application, so Newton-GMRES is also useful for me.

Actually I have a follow-up question now: Quasi-newton methods like Broyden and Newton-Krylov seem to target two different bottlenecks: Computing jacobian and linear solving. As @ctkelley said Broyden has fallen out of favor, but how do modern solvers manage the cost of computing jacobian?

Apologize if it’s another newbie question. You guys are really amazing. Thanks a lot!

1 Like

Newton-Krylov methods do not need a Jacobian, only a Jacobian-vector product. You can get a perfectly accurate Jacobian-vector product, for example, with a simple forward difference approximation. Here’s another shameless plug, see chapters 3 and 6 of

author="C. T. Kelley",
title="{Iterative Methods for Linear and Nonlinear Equations}",
series="Frontiers in Applied Mathematics",

I see. The thing I didn’t realize before is that it only takes two evaluations of function f to obtain the jacobian-vector product. I naively thought it still takes maximum(colors)+1 evaluations. Things make all sense to me now.

and thanks for the great reference!