Nonlinear System of Equations with Bounds/Constraints on Unknowns

Yup, pretty much. If you find a solver which directly provides an interface for the nlls it may be easier, otherwise take the sum of squares like that. The only this to consider that makes nlls special is that the hessian can be approximated using the Jacobian of the residuals, if you need it. I don’t remember the exact formula.

Yes, and any other constraints. Linear ones especially are useful.

Tough to know. Usually the constrained nonlinear least squares seems the best place to start. It is tough to predict which optimizer, method, and formulation work best for nasty nonconvex systems.

JuMP is a frontend interface to various optimizers providing its own specification . If your problem can be formulated with in it, then use it. For the optimizers themselves, knitro is commercial but works very well. It has a multistart options which help with global solutions

If you want some variables to be positive, say \theta, change variables \theta:=u^2 and solve in u. That seems obvious but…

1 Like

100%. @bielim if you are just worried about some parameters being positive, then this sort of thing is sufficient. As are standard tricks like converting into log space. You can also find all sorts of other transformations, and things like and can help you do the variable transformations easier. If your problem is otherwise well behaved then I suspect nlsolve would be perfectly fine.

If you have a large and/or nasty and non-convex problem - then the “throw it at a serious optimizer” approach and use multistart tends be very robust. This is true even if you didn’t need these sorts of variable transformations… but is the sort of hammer you use as a last resort.

1 Like

Yes, true - I did realize that this would do the trick as long as positivity is my only type of constraint, and it actually gave okay results. I was still curious to learn about general ways of dealing with constrained nonlinear systems of equations because at some point I will probably have to deal with variations of this problem that involve different kinds of constraints (e.g., that the \theta_i are bounded by some interval [a, b]).
But I think that for the time being this is indeed the solution with the best cost/benefit ratio!

Thank you very much for your help, @Tamas_Papp, @rveltz and @jlperla! Your suggestions are great guidance for methods/packages to explore for more flexible approaches!

If you want some variables to be positive, say θ, change variables θ := u^2 and solve in u

Bad idea in general:

1 Like

I apologize for the long answer. Below are 3 experiments.

Here’s a solution based on solving a feasibility problem (minimizing a constant objective function subject to your system as constraints). This allows you to include bounds on your variables. However, it seems to me that your problem is over-constrained: 6 variables and 7 equality constraints if you include w_1 + w_2 = 1. I left this last constraint out and it appears to be (approximately) satisfied at the solution found by IPOPT:

(@v1.4) pkg> activate test-feas

(test-feas) pkg> add NLPModels NLPModelsIpopt

julia> using NLPModels, NLPModelsIpopt

julia> function c(x)
           M = [2.250, 9.675, 57.263, 427.219, 3836.109, 40234.852]
           F = [ M[1] - x[5]*x[3]*x[1] - x[6]*x[4]*x[2] ;
                 M[2] - x[5]*x[3]^2*(x[1]+1)*x[1] - x[6]*x[4]^2*(x[2]+1)*x[2] ;
                 M[3] - x[5]*x[3]^3*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^3*(x[2]+2)*(x[2]+1)*x[2] ;
                 M[4] - x[5]*x[3]^4*(x[1]+3)*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^4*(x[2]+3)*(x[2]+2)*(x[2]+1)*x[2] ;
                 M[5] - x[5]*x[3]^5*(x[1]+4)*(x[1]+3)*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^5*(x[2]+4)*(x[2]+3)*(x[2]+2)*(x[2]+1)*x[2] ;
                 M[6] - x[5]*x[3]^6*(x[1]+5)*(x[1]+4)*(x[1]+3)*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^6*(x[2]+5)*(x[2]+4)*(x[2]+3)*(x[2]+2)*(x[2]+1)*x[2] ]
                 #x[5] + x[6] - 1]  # left this out
           return F

julia> x0 = [0.5; 0.5; 0.5; 0.5; 0.5; 0.5];

julia> # define a model with derivatives computed by ForwardDiff
julia> # bounds on the variables are included
julia> model = ADNLPModel(x -> 0.0, x0; c=c, lcon=zeros(6), ucon=zeros(6), lvar=zeros(6))

julia> stats = ipopt(model)  # solution found after 481 iterations

julia> x = stats.solution
6-element Array{Float64,1}:

julia> x[5] + x[6]

Choosing a different starting point might give you the solution you’re looking for.

As an experiment, I tried formulating your problem as an over-determined nonlinear least-squares problem with bounds and include the term w_1 + w_2 - 1 in the residual. IPOPT doesn’t manage to solve it but KNITRO does (KNITRO is a commercial solver but you can download a demo version from the Artelys website):

(test-feas) pkg> add NLPModelsKnitro

julia> using NLPModelsKnitro

julia> function c(x)
           M = [2.250, 9.675, 57.263, 427.219, 3836.109, 40234.852]
           F = [ M[1] - x[5]*x[3]*x[1] - x[6]*x[4]*x[2] ;
                 M[2] - x[5]*x[3]^2*(x[1]+1)*x[1] - x[6]*x[4]^2*(x[2]+1)*x[2] ;
                 M[3] - x[5]*x[3]^3*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^3*(x[2]+2)*(x[2]+1)*x[2] ;
                 M[4] - x[5]*x[3]^4*(x[1]+3)*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^4*(x[2]+3)*(x[2]+2)*(x[2]+1)*x[2] ;
                 M[5] - x[5]*x[3]^5*(x[1]+4)*(x[1]+3)*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^5*(x[2]+4)*(x[2]+3)*(x[2]+2)*(x[2]+1)*x[2] ;
                 M[6] - x[5]*x[3]^6*(x[1]+5)*(x[1]+4)*(x[1]+3)*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^6*(x[2]+5)*(x[2]+4)*(x[2]+3)*(x[2]+2)*(x[2]+1)*x[2] ;
                 x[5] + x[6] - 1]
           return F

julia> model = ADNLSModel(c, x0, 7, lvar=zeros(6))  # NSL = nonlinear least squares

julia> stats = knitro(model)  # solved in 154 iterations

julia> x = stats.solution
6-element Array{Float64,1}:

Finally, you can try one of our pure Julia solvers that handles bound constraints:

(test-feas) pkg> add JSOSolvers
julia> using JSOSolvers

julia> stats = tron(model)  # solved in 1006 iterations

julia> stats.solution
6-element Array{Float64,1}:

I don’t think that general methods exist. Usually investing a bit in understanding the structure of the problem and coming up with either

  1. some useful transformation,
  2. a good initial guess,
  3. a nicely behaved simplified problem useful in homotopy continuation

is helpful. The more of the above, the better.

If there are just bound constraints then interval methods provide a method that can find all (isolated) roots - in prínciple.

However, the current implementation in IntervalRootFinding.jl is currently unable to solve the OP’s problem in a reasonable time.

Since this is a system of polynomial equations, you can actually compute all (isolated) solution of this system using HomtopyContinuation.jl. In your case the system has 18 (complex) solutions and usually 4 real solutions.


using HomotopyContinuation

@polyvar M[1:6] x[1:6]

system = [
  M[1] - x[5] * x[3] * x[1] - x[6] * x[4] * x[2],
  M[2] - x[5] * x[3]^2 * (x[1] + 1) * x[1] - x[6] * x[4]^2 * (x[2] + 1) * x[2],
  M[3] - x[5] * x[3]^3 * (x[1] + 2) * (x[1] + 1) * x[1] -
  x[6] * x[4]^3 * (x[2] + 2) * (x[2] + 1) * x[2],
  M[4] - x[5] * x[3]^4 * (x[1] + 3) * (x[1] + 2) * (x[1] + 1) * x[1] -
  x[6] * x[4]^4 * (x[2] + 3) * (x[2] + 2) * (x[2] + 1) * x[2],
  M[5] - x[5] * x[3]^5 * (x[1] + 4) * (x[1] + 3) * (x[1] + 2) * (x[1] + 1) * x[1] -
  x[6] * x[4]^5 * (x[2] + 4) * (x[2] + 3) * (x[2] + 2) * (x[2] + 1) * x[2],
  M[6] -
  x[5] * x[3]^6 * (x[1] + 5) * (x[1] + 4) * (x[1] + 3) * (x[1] + 2) * (x[1] + 1) * x[1] -
  x[6] * x[4]^6 * (x[2] + 5) * (x[2] + 4) * (x[2] + 3) * (x[2] + 2) * (x[2] + 1) * x[2],

# this computes a set of start solutions which we can use to compute
# all real solutions to any M we want
start_res = monodromy_solve(system; parameters = M)

target_res = solve(
  parameters = M,
  start_parameters = parameters(start_res),
  target_parameters = [2.250, 9.675, 57.263, 427.219, 3836.109, 40234.852],

4-element Array{Array{Float64,1},1}:
 [2.05385029610331, 0.4047625777040844, 1.491193381943373, 1.805587741149837, 0.6421990255934416, 0.3874267420604687]
 [2.0064087361571805, 0.4601905412775597, 1.4990976942165646, 1.0902146315738315, 0.6949587215536865, 0.318316159418627]
 [0.4047625777014698, 2.0538502961020852, 1.8055877411449965, 1.4911933819437089, 0.38742674206072847, 0.6421990255944628]
 [0.4601905412704846, 2.0064087361588445, 1.09021463159536, 1.4990976942163359, 0.31831615942240693, 0.6949587215523535]

On our homepage we also have an example of the method of moments applied to a mixture of gaussians.


Thank you, @dpo, this is tremendously helpful!
I installed the KNITRO demo version and performed some experiments following your examples (I kept the equation w_1+w_2=1 in the system and removed the equation for M[6] instead, so I was solving a system of 6 equations for 6 unknowns, where all unknowns had a positivity constraint). The table below summarizes the results:

Solver Model Time to solution (in s) Solution MSE
Ipopt ADNLP 11.77 [0.394, 2.045, 1.724, 1.494, 0.384, 0.651] 0.091
Ipopt ADNLS 5.30 [1.875, 1.875, 1.528, 1.528, 0.384, 0.384] 0.382
knitro ADNLP 5.45 [2.043, 0.393, 1.494, 1.704, 0.653, 0.383] 0.086
knitro ADNLS 1.74 [1.994, 0.504, 1.501, 0.962, 0.703, 0.297] 0.000
tron ADNLP 7.70 [0.500, 0.500, 0.500, 0.500, 0.500, 0.500] 0.597
tron ADNLS 4.65 [1.874, 1.874, 1.528, 1.528, 0.384, 0.384] 0.382

KNITRO is the winner, especially when applied to the formulation as a nonlinear least-squares problem. I will do some more experiments with KNITRO to get some data on how reliably it finds a close approximation to the correct solution, but so far I’m pretty impressed by it!

Given the large spread in the results, I was wondering if someone could share some insight as to what the main differences between the solvers are - do they use fundamentally different algorithms? Or is KNITRO just “smarter” in terms of choosing (maybe even on-the-fly changing) algorithms in the process of solving the problem?
When running KNITRO, I also saw the output “No start point provided – Knitro computing one”, even though I had provided a starting value. Does that mean that KNITRO is able to identify bad starting points and will proceed to find better ones if needed?

TLDR: KNITRO is the clear winner! Why is it better than the other solvers?

@bielim Glad to help! KNITRO is a mature solver that contains many people’s many years of research and efforts. It also contains several algorithms (some of the interior-point variety, including one that is quite similar to IPOPT, and some of the active-set variety). If you don’t specify an algorithm, KNITRO chooses one for you automatically at the beginning (you should see this if you inspect the output). You’ll probably want to try each algorithm and identify the one that’s best for your problem.

Using our interface to KNITRO, you can specify a starting point with

stats = knitro(model, x0=[0.5; 0.5; 0.5; 0.5; 0.5; 0.5])


stats = knitro(model, x0=model.meta.x0)  # use starting point specified in the model

By default, we decided to let KNITRO compute its own. The interior-point algorithms will reject your starting point if it doesn’t strictly satisfy the bound constraints.

I’m quite glad to see that TRON isn’t doing too badly either (at least with the NLS model). It’s a projected-direction method. I would have to investigate a little to determine why it has trouble solving the other model.

Feel free to open issues on our repos if you have difficulties. Also feel free to ask questions on our Slack channel:

1 Like

Try running the tuner to see if it helps

m = Model(with_optimizer(KNITRO.Optimizer, tuner = 1))

It might find an even better algorithm

This is beautiful, @saschatimme! I’ve been trying the approach you posted with different values of the parameter M – for a single parameter, homotopy continuation is slower than the other methods I tried, but the call to monodromy_solve is a one-off cost that I’m happy to pay given that I’ll have to repeat the calculation for many different values of M (and the subsequent iterations are pretty fast!).

As an aside, I’d like to express my appreciation for the amazing Nextjournal documentation – really neat and user-friendly!

When trying different values for M, I always got four real solutions. Two of them are actually the same, because the solution is symmetric in the sense that the two components of the mixture distribution the solution describes can be relabeled while leaving the mixture pdf invariant. I’m only interested in the mixture pdf, so the “double presence” of solutions is not really a problem - picking either one of the solutions is fine. (But I saw that HomotopyContinuation’s GroupActions would probably be able to filter the solutions down to the ones that are truly different, and I will give this a try.)

However, I also tried to incorporate the equality constraints on the weights (w_1+w_2= 1) into the system of equations by replacing all instances of w_2 with 1-w_1, and when I do that (now dealing with a system of 6 equations for 5 unknowns), the solver doesn’t find a real solution anymore.
I guess I could just pick the solution that satisfies the constraint on the weights most accurately, but I was wondering why directly incorporating it into the system doesn’t seem to work, and if there is a way I can include the constraint in the problem setup rather than after the solution has been computed.

Happy to hear that you find this approach helpful and that the documentation is helpful as well :slight_smile:

When you have a more equations than unknowns, then you need a very special set of equations such that you still have solutions. Geometrically, your 6 equations in 6 unkowns already only result in 18 points, and now you try to intersect these with the hyperplane 1=w_1 + w_2. So these 18 points have to be quite special such that at least one lies on this plane. Obviously, you know for theoretical reasons that this is the case here. However, I assume your measure are not exact, right?
This perturbation in the measure then would yield solutions which are a little bit away from the hyperplane. When you look at the 4 real solutions for your example above, then you notice that w_1 and w_2 don’t sum up to 1 exactly

julia> 0.31831615942330055 + 0.6949587215517059

julia> 0.6421990255933143 + 0.38742674206054184

If this constraint is important to be satisfied, maybe you could use the obtained solutions coming from HomotopyContinuation.jl as a starting point for a least square optimization routine.


many people’s many years of research and efforts

So no magic bullet, just the magic of hard work :slight_smile:

I see - I had assumed that the starting value x0 given as input to the Model (e.g., model = ADNLSModel(c, x0, 7, lvar=zeros(6))) would automatically be used as the starting point for knitro as well.

I’ll keep experimenting and will post any issues I encounter!

m = Model(with_optimizer(KNITRO.Optimizer, tuner = 1))

Apparently with_optimizer has been deprecated and replaced by Model(optimizer_with_attributes(KNITRO.Optimizer, "tuner" => 1))

I tried this – the Knitro Tuner decided that the fastest solution would be found by a solver that involves “algorithm 3”, which based on the user manual is an Active-Set algorithm. But the tuning did not lead to an improvement in the time to solution.

Yeah, you never know. Your problem is sufficiently small that there may not be much difference.

The other two very useful options for knitro to keep in mind in the future are: (1) ms_enable = 1 which turns on multi-start, and is huge time save if you have ugly, non-convex equations; and (2) honorbnds = 1 which was necessary for me when my box-bounds defined tight constraints on when my equations could be evaluated.

That makes sense, thanks for the explanation! Yes, as you point out, for theoretical reasons this overdetermined system should have a unique solution (up to flipping the labels of the two mixture components), but I understand that numerical inaccuracies can stand in the way of finding that solution.

I’ll test if using the homotopy continuation solution as an extremely educated inital guess for a subsequent optimization algorithm is too computationally expensive. As a simpler alternative, picking the solution that best satisfies the constraint, followed maybe by “re-normalizing” the weights (w_{1, \textrm{normalized}} = \frac{w_1}{w_1+w_2}, w_{2, \textrm{normalized}} = \frac{w_2}{w_1+w_2}) such that the constraint is exactly satisfied may already be good enough an approximation for my purpose.

On a different note: Is it possible to accept more than one reply as solutions? I tried to mark both @dpo’s and @saschatimme’s replies, but when marking the second one the first one gets unmarked.

Unfortunately, it isn’t possible. Just mark the one you prefer.

1 Like