Comparing non-linear least squares solvers

TheLateKronos · October 11, 2023, 7:35am

Lovely. JuliaPackageComparisons is nowhere close to what I want it to be, but quality content like exactly what is needed to get it there!

user664303 · October 12, 2023, 1:15pm

Just a note to say I’ve updated the table and graphs in the first post of this thread. They now include only those solvers that I was able to add to the benchmark code.

If you’d like to see a specific solver added to the table, please first add it to the benchmark.

ig-or · October 12, 2023, 11:11pm

What is a good simple to use solver package for the case when one single cost function evaluation takes about 100 minutes? I know this depend on the function itself, but this I do not know yet… should I start with BlackBoxOptim.jl ?

stevengj · October 13, 2023, 12:33am

Depends a lot on your function. How many parameters does it have? Is it differentiable? (If so, it’s ideal if you can compute derivatives, but even if not there are derivative-free algorithms that internally exploit the existence of derivatives.) What kind of constraints?

pkofod · October 13, 2023, 7:53am

The Rosenbrock function has a global minimum of 0. You may want to include another variant of the Rosenbrock function, say something like

that I got from “METHODS FOR NON-LINEAR LEAST SQUARES PROBLEMS” by K. Madsen, H.B. Nielsen, O. Tingleff (link: http://www2.imm.dtu.dk/pubdb/edoc/imm3215.pdf ). Why? Because LM and some other NLS specific algorithms can have poor convergence properties when the residual at the optimum is significantly different from 0. You can control the final residual by varying at lambda.

If you’re using this to solve nonlinear equations you’re fine because the residual will be 0 at the solution (convergence rates emerge from local properties so it’s the residual at the solution that’s important), otherwise you should maybe consider some curve fitting (to data) examples as well. However, using nonlinear least squares to solve systems of equations has it’s own set of issues, because local solutions to the nonlinear least squares problem does not necessarily have zero residuals, and as such they may not solve the underlying nonlinear equations you’re trying to solve.

For several reason, including the non-zero residual error issue above, I personally have great experience with not using LM for NLS. There are so many other algorithms out there for nonlinear optimization that can be more robust and often converge quite quickly. LM was an excellent invention that brought a trust region-like approach to NLS, but I’m actually not quite sure that LM is seen as a leading algorithm in 2023. Looking at CERES it seems like they have made the same conclusion and they include N-CG, BFGS, etc as well.

I never got around to reimplementing LM for NLSolvers.jl, but my point is just that “any optimizer” is a “NLLS/NLS solver”. Can you exploit structure? Yes. Is it necessarily the most important part when picking a solver? Not sure. I’m not an expert on this topic, but the JuliaSmoothOptimizers people have their solver and have actually published in the field, so maybe they have more to say. The scale of the problem also plays a role.

user664303 · October 13, 2023, 9:52am

Thanks for this. Very interesting. I am sceptical about the claim, and have studied the reference you gave (which I’m familiar with). I really don’t understand how they can claim this. The quadratic problem at each step will be identical (given the same x), so the step taken can only be affected by the value of the damping factor. The damping factor update will only be impacted by the offset in so much as the offset will reduce the accuracy of the change in cost, due to floating point round off. I would expect this to have minimal impact. I’ve written a gist here to test the claim, and it does indeed show that the non-zero residual makes almost no difference. Most of the steps taken are identical, then very close to the minimum the damping factor of the augmented problem drops faster, so the optimizer actually converges faster. Here’s the output (I’ve adjusted the cost of the augmented problem in the graph, so the two costs are directly comparable):

The code is an interactive plot, allowing you to select different starting points in the state space.

I think the presence of these solvers in Ceres is more so people can compare, rather than they’re better. For the types of problems that people tend to solve with Ceres (visual geometry problems), which do end up with non-zero residuals, LM is usually the best. See Bundle Adjustment - A Modern Synthesis.

Yes. But they don’t exploit Gauss’ approximation to the Hessian. NLLS-specific solvers do, such that:

The approximate Hessian can be computed using only the 1st derivative of the residuals (as opposed to twice differentiating the squared cost function). It’s simpler and more efficient.
The Hessian is positive (semi-)definite, so the gauss-newton step is always a minimum of the quadratic approximation.

This is why (I believe) general solvers tend to have worse performance. But I’m very happy to be proved wrong. I’d love to find a better solver than LM.

pkofod · October 13, 2023, 9:54am

You point about the specific example seems quite valid. I agree.

user664303 · October 13, 2023, 9:57am

Perhaps it’s different if the non-zero residuals actually depend on the variables. Not sure. But I find in practice, on real problems with non-zero residuals, that LM works very well.

pkofod · October 13, 2023, 10:01am

LM can work while being suboptimal But yes, I took the example because it was easy, but in the examples I had it mind, it would be nonzero in a slightly more complicated manner. But I’m happy to be proven wrong.

user664303 · October 13, 2023, 4:15pm

@TheLateKronos There’s a PR waiting to be merged.

mateuszbaran · October 20, 2023, 9:01am

Regarding non-Euclidean variables (especially those from non-Hadamard manifolds like sphere or SO(3)), I think it is important to note that there are two standard ways of handling the issue of not being able to cover the entire manifold with one chart. The most common one is using retractions, vector transports and bases of tangent spaces. This is what Manopt.jl does. The other one is chart switching, which is easier to integrate with generic Euclidean solvers, and is the way Manifolds.jl handles integrating geodesics. I don’t see how NLLSsolver.jl does either?

user664303 · October 20, 2023, 9:47am

Unfortunately I’m not familiar with these terms.

In NLLSsolver, each variable type has an update function implemented, that essentially takes a Euclidean vector in the tangent space of the current variable state (i.e. passing in a vector of zeros for the update vector will return the current variable state), and updates the variable state accordingly. This generally involves a projection onto the manifold. Perhaps that’s what you mean by retraction? At each iteration, the Euclidean space that the linear solver works in is updated to the current tangent space of each variable - perhaps this is what you mean by chart switching?

To summarize, the update function takes in a variable type and a Euclidean vector in the current tangent space of the variable, and outputs a suitably updated variable. NLLSsolver is agnostic to what the update method does under the hood, or indeed anything else about the variable.

My implementation for SO(3) matrices can be seen here. The rodrigues method computes the exponential map from {\mathfrak {so}}(3) to SO(3).

I think it would be easy to use any Manifolds.jl type, simply by implementing this update function, plus an nvars function that returns the dimensionality of the tangent space. Is there an abstract Manifolds type for which it can be defined?

mateuszbaran · October 20, 2023, 10:14am

Yes, update is similar to retraction.

No, chart switching is different. It sounds more similar the retraction-based approach. SO(3) is not an ideal example here because you can just use Lie algebra which simplifies things but in general you can’t assume that a tangent vector at one point can just be used at another one. Vector transport is what adapts tangent vectors between points. Very often just doing a projection is enough (that’s what Optim.jl does) but Manopt.jl is more generic.

user664303 · October 20, 2023, 10:26am

Unless I’ve misunderstood what you’re saying, my solver doesn’t transport tangent vectors (Euclidean vectors in the tangent space?) between different points/tangent spaces. But I’m very happy to engage in a private chat to get to the bottom of what you’re saying.

mateuszbaran · October 20, 2023, 10:46am

I see, NLLSolver.jl algorithms just don’t require vector transport, thanks for clarification.

user664303 · October 20, 2023, 10:50am

Correct. NLLSsolver never uses information computed in one tangent space to compute an update step in another tangent space (indeed, I’m aware of the problem associated with doing this). Hence vector transport is never required.

user664303 · October 25, 2023, 2:37pm

The comparison I published in the original post has now been published in juliapackagecomparisons.github.io.

Thanks, @TheLateKronos, for creating that resource. I hope it takes off.

user664303 · March 5, 2024, 9:34am

The comparison has moved, to here.

Topic		Replies	Views
Nonlinear Least Squares Millions of Allocations Optimization (Mathematical) jump , ipopt	9	506	June 15, 2024
Differences between NLsolve and Optim in solving system of equations Optimization (Mathematical)	24	3039	November 16, 2022
A new solver for robustified non-linear least squares problems, with non-Euclidean variables Optimization (Mathematical)	6	809	April 14, 2023
[ANN] Manopt.jl Package Announcements optimization , math , manifolds	53	7076	August 29, 2024
Optimization on a manifold Optimization (Mathematical) question	33	1904	June 21, 2021

Comparing non-linear least squares solvers

Related topics