Comparing non-linear least squares solvers

I’m creating this thread as a resource for people looking to solve non-linear least squares problems in Julia, so they know what tools are available, what features they have and how well they perform.

UPDATE: This comparison has been added to juliapackagecomparisons.github.io

I’ll start with solvers and features. This list is simply what I found and was able to test, and my understanding of the features.

Ipopt JSO NLLSsolver.jl LeastSquaresOptim.jl
Registered package(s) :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
Uses JuMP model definition :white_check_mark: :white_check_mark: :x: :x:
Bound constraints :white_check_mark: :white_check_mark: :x: :white_check_mark:
Equality constraints :white_check_mark: :x: :x: :x:
Non-linear constraints :white_check_mark: :x: :x: :x:
Robustified cost functions :x: :x: :white_check_mark: :x:
Non-Euclidean variables :x: :x: :white_check_mark: :x:
Dense auto-differentiation :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
Supports sparsity :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
Sparse auto-differentiation :white_check_mark: :white_check_mark: :white_check_mark: :x:

The following performance comparisons are evaluated on an Apple M1 Pro CPU, using this script.

Optimization times on small, dense problems of varying sizes:

All the solvers used auto-differentiation to compute the derivatives. Ipopt got stuck in a local minimum for the final (largest) problem, but all other solvers found the global optimum.

Optimization times on medium sized, sparse problems of varying sizes:

All the solvers except LeastSquaresOptim used auto-differentiation to compute the derivatives. The latter used hand-computed derivatives, since auto-differentiation of sparse systems is not supported. All solvers reached the global optimum.

If you believe other solvers should be in this comparison, please first add them to the benchmark, and send me the updated code (link to a gist is best).

17 Likes

In terms of performance it would be nice to benchmark all solvers over a range of problem types and sizes.

For now I’ve benchmarked a few solvers on a multi-dimensional Rosenbrock problem, which leads to a sparse, bi-diagonal linear system. I’ve also added a similar, dense problem.

I’ve now added the graphs to the first post in this thread. I’ll aim to keep this graphs updated as more solvers are added.

2 Likes

I guess the other option to add to the table is JuMP directly with Ipopt. Something like:

using JuMP, Ipopt
function main(f::Function, data::Vector, y::Vector)
    model = Model(Ipopt.Optimizer)
    @variable(model, x)
    @variable(model, residuals[1:length(y)])
    @constraint(model, residuals .== f(data, x) .- y)
    @objective(model, Min, sum(residuals.^2))
    optimize!(model)
    return value.(x)
end
1 Like

I’m definitely interested to try Ipopt. Why do you add residuals as variables with a constraint, instead of just doing this?:
@objective(model, Min, sum((f(data, x) .- y) .^ 2))

1 Like

See also the thread: Suggestions needed for bound-constrained least squares solver

The StructuredOptimization.jl package is worth considering here too.

2 Likes

So that the Hessian is diagonal. Its usually more efficient

2 Likes

Manopt.jl also has a non-linear least square solver: Levenberg–Marquardt · Manopt.jl (though it’s not the main focus of the package). It supports manifold constraints only and can be used with AD and sparse Jacobians.

5 Likes

I don’t understand this. The hessian block for x will (in general) not be diagonal.

In any case I haven’t been able to successfully create a sum of squares cost function for Rosenbrock. Whatever I do generates a compilation error. Here’s one such implementation, along the lines of your approach:

model = Model(Ipopt.Optimizer)
x0 = [(n ≤ 16 && i % 2 == 0) ? 1.0 : -1.2 for i = 1:n]
@variable(model, x[i = 1:n], start = x0[i])
@NLexpression(model, FA[k = 1:(n - 1)], 10 * (x[k + 1] - x[k]^2))
@expression(model, FB[k = 1:(n - 1)], 1 - x[k])
@variable(model, residuals[1:2*(n-1)])
@constraint(model, residuals .== [FA; FB])
@objective(model, Min, sum(residuals .^ 2))

The Hessian of the objective function with respect to all problem variables ([x; residuals]) will be diagonal with either 1 or 0s.

Changing

to

@NLconstraint(model, [k=1:2*(n-1)], residuals[k] == [FA; FB][k])

should do it.

I’m very confused. A simple by-hand derivation of the Hessian of the cost function I define above won’t be either diagonal, or 1 or 0. Note that the variables x are not integers; they’re real values.

Thanks. This worked (ran). However, it’s incredibly slow/fails to converge. Is there a way to set the maximum optimization time in JuMP?

I should have said 2 or 0, given the power of 2, but this is the context: Hessian matrix - Wikipedia.
If f(x_1, r_1, r_2) = r_1^2 + r_2^2 then H(x_1, r_1, r_2) = \left(\begin{array}{ccc}0 & 0 & 0 \\0 & 2 & 0 \\0 & 0 & 2\end{array}\right).

What is n in your tests?

JuMP.set_time_limit_sec(model::Model, limit::Real)

Ah yes. Thanks. But then all the non-linearity is in the equality constraints. Do these not appear in some Hessian of the dual? My memory of Lagrangian multipliers is hazy, but I thought they too had a Hessian? If that information isn’t present in the system then it’s going to converge very slowly. No wonder it performs badly (see the graph below).

Is this sensible, @odow? What would it look like without the equality constraints; i.e. a standard, unconstrained least squares formulation? I can’t get anything written like that to compile.

The largest is 10000.

Thanks. Here’s a plot with the time limited to 30s:

Note it looks like you’re simply terminating at 2000 iterations? Based on line 23 in your comparison code:

function NLLStest(sizes)
    # First run the optimzation to compile everything
    options = NLLSOptions(maxiters = 2000)

Yes, NLLSsolver does indeed terminate after 2000 iterations currently. This happens only for the largest test case, (n=10000). The other solvers terminate after far fewer iterations (and at a higher cost), but after 30 seconds of runtime. Currently NLLSsolver can’t terminate based on runtime, but I will implement that. I’m also planning to add visualizations of the final cost.

In the meantime here’s a graph of tests for which all solvers reached the optimal solution:

That is an interesting comparison, as @MateuszBaran mentioned, Manopt.jl can do least squares on arbitrary manifolds (those from Manifolds.jl), so I would be interested how (and which). non-Euclidean variables are realised in NLLSsolver.jl? Maybe we can join forces a bit in that non-Euclidean matter.

I’ve implemented 3D rotation matrices (see the implementation here) and bounded variables (as we discussed before), but it’s easy to add your own. It should be very straightforward to use variables from Manifolds.jl in NLLSsolver.jl. You just need to overload the NLLSsolver.update method for the particular variable type.

I’m happy to discuss it further. Please message me direct.

I’m slowly adding more solvers. I may try to add that one. But it’s a Euclidean problem.

Yes, I remember. We could check whether you could be based on ManifoldsBase.jl (as well) in the update, then it might work on manifold more generally (but I would have to check how you update).

This is a great comparison! Would you concider making a PR to add it to JuliaPackageComparisons?

2 Likes

Just one note on terminology: it is my understanding that Ipopt is the solver, while JuMP is the software that allows you to write your optimisation problem in nice mathematical, solver independent terms and from there it builds the actual problem to pass to the solver…

1 Like

Thanks. I’ll aim for that. I’d like to broaden the tests and generate a few more statistics first.