Nonlinear System of Equations with Bounds/Constraints on Unknowns

I am trying to find a numerical solution to the following nonlinear system of equations:

M_1 - w_1 θ_1 α_1 - w_2 θ_2 α_2 = 0
M_2 - w_1 θ_1^2 (α_1+1) α_1 - w_2 θ_2^2 (α_2+1) α_2 = 0
M_3 - w_1 θ_1^3 (α_1+2) (α_1+1) α_1 - w_2 θ_2^3 (α_2+2) (α_2+1) α_2 = 0
M_4 - w_1 θ_1^4 (α_1+3) (α_1+2) (α_1+1) α_1 - w_2 θ_2^4 (α_2+3) (α_2+2) (α_2+1) α_2 = 0
M_5 - w_1 θ_1^5 (α_1+4) (α_1+3) (α_1+2) (α_1+1) α_1
~~~~~~ - w_2 θ_2^5 (α_2+4) (α_2+3) (α_2+2) (α_2+1) α_2 = 0
M_6 - w_1 θ_1^6 (α_1+5) (α_1+4) (α_1+3) (α_1+2) (α_1+1) α_1
~~~~~~ - w_2 θ_2^6 (α_2+5) (α_2+4) (α_2+3) (α_2+2) (α_2+1) α_2 = 0

The unknowns are \alpha_1, \alpha_2, \theta_1, \theta_2, w_1 and w_2; the M_i~ (i = 1, ..., 6) are known.
In addition, it is known that all unknowns are greater than zero, and that w_1 + w_2 = 1.
(For context: The M_i are moments of a 2-component mixture distribution, (\alpha_1, \theta_1) and (\alpha_2, \theta_2) are the parameters of the two mixture components, and w_1 and w_2 are their respective weights.)

I tried the following approach using NLsolve:

using NLsolve

function f!(F, x, M)
    # x: [α1, α2, θ1, θ2, w1, w2]
    F[1] = M[1] - x[5]*x[3]*x[1] - x[6]*x[4]*x[2]
    F[2] = M[2] - x[5]*x[3]^2*(x[1]+1)*x[1] - x[6]*x[4]^2*(x[2]+1)*x[2]
    F[3] = M[3] - x[5]*x[3]^3*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^3*(x[2]+2)*(x[2]+1)*x[2]
    F[4] = M[4] - x[5]*x[3]^4*(x[1]+3)*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^4*(x[2]+3)*(x[2]+2)*(x[2]+1)*x[2]
    F[5] = M[5] - x[5]*x[3]^5*(x[1]+4)*(x[1]+3)*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^5*(x[2]+4)*(x[2]+3)*(x[2]+2)*(x[2]+1)*x[2]
    F[6] = M[6] - x[5]*x[3]^6*(x[1]+5)*(x[1]+4)*(x[1]+3)*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^6*(x[2]+5)*(x[2]+4)*(x[2]+3)*(x[2]+2)*(x[2]+1)*x[2]
end

As a test, I ran the following example, whose correct solution is (approximately):

\alpha_1 = 0.5, \alpha_2 = 2.0, \theta_1 = 1.0, \theta_2 = 1.5, w_1 = 0.3, w_2 = 0.7

M = [2.250, 9.675, 57.263, 427.219, 3836.109, 40234.852] # moments
initial_guess = [0.5; 0.5; 0.5; 0.5; 0.5; 0.5]
nlsolve((F, x) -> f!(F, x, M), initial_guess, autodiff=:forward)

Output:

Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [0.5, 0.5, 0.5, 0.5, 0.5, 0.5]
 * Zero: [1282.030044474002, 1.3349431507591707, 0.013431556054807586, 1.5969473131487903, 0.00013193288212788804, 1.4550855337869975]
 * Inf-norm of residuals: 10.878979
 * Iterations: 1000
 * Convergence: false
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: false
 * Function Calls (f): 1001
 * Jacobian Calls (df/dx): 573

Unfortunately this is not close to the correct solution, which leaves me with two questions:

  • Is there a way to incorporate the bounds/constraints on the unknowns (i.e., that they are all positive and that the two weights sum to 1)? This would at least rule out “impossible” solutions.

  • In general, are there alternatives to NLsolve that would be better suited to this problem and would give more accurate and reliable approximations to the solutions, at a similar or higher speed?

Thanks for any help!

If w_1 + w_2 = 1, you could reduce it to one variable.

Generally, global solutions to nonlinear systems can be difficult.

If you are solving this repeatedly for various M, I would suggest devoting some effort to understanding the structure, obtaining a simplified system, and trying homotopy continuation (eg in M?), see

If you just need one solution, you could take the squares of residuals, and try a global method for minimizing that. This is not very robust so it is not the best for doing this repeatedly, but cheap to try for a single case.

1 Like

Yes, using w_1 + w_2 = 1 to eliminate one of the two weights would take care of incorporating that constraint. I guess I didn’t write the system in that way because I think of the requirement that the weights form a convex combination as a constraint on the solution (though of course directly incorporating it into the equations is equivalent), and because I was hoping for a functionality like Mathematica’s Assuming[{alpha1>0, theta1>0, alpha2>0, theta2>0, w1>0, w2>0, w1+w2==1}, Solve ...] that would allow me to also add the positivity constraints on the distribution parameters, which can’t be included in the system of equations.

I will look into homotopy continuation methods, thanks for the pointer! (I will indeed have to solve the system repeatedly for various M)

1 Like

It looks like a polynomials system. I am happy to help you if you want to use my package PseudoArcLengthContinuation.jl but note that the following HomotopyContinuation.jl may be more appropriate in your case.

2 Likes

Thank you very much, @rveltz! I’ll start with HomotopyContinuation.jl then :slight_smile:

Before I start looking into homotopy continuation methods, as suggested by @Tamas_Papp and @rveltz: Is there really no way to add bounds on the unknowns to NLsolve?

If you need to add inequalities satisfied by your solutions, JuMP might be a good start.

It doesn’t always work, but a common trick is to try to solve your system with constrained nonlinear least squares. Your system of equations is the “residual” and you can throw whatever bounds you want into the solver. Then you can use mulitstart/etc. trying to solve the problem from many initial conditions. Tends to work well if you have a good solver.

Of course, if the solution actually is on one of those boundaries (and theoretically could be) then you have a nonlinear complementarity problem rather than a nonlinear system.

Another trick is to use comerical optimizers and set the objective to just 1 or whatever and let the solvers try to just solve the feasibility problem.

Whether any of this is better than nlsolve, it is hard to say.

This is the sort of thing that works well when throwing into optimizers. Throw anything like that into linear constraints and they handle it well.

Hi @jlperla, thanks for your reply!

Let me see if I understand what you are saying (please correct me if I got it wrong):

  • I can reformulate my system of equations as an optimization problem by defining an objective function J to be a sum of squares, where the things that get squared (the “residuals”) are the equations themselves, so I would get:
    J = [M_1 - w_1 θ_1 α_1 - (1-w_1) θ_2 α_2]^2
    ~~~+ [M_2 - w_1 θ_1^2 (α_1+1) α_1 - (1-w_1) θ_2^2 (α_2+1) α_2]^2
    ~~~+ ~...

, and I could then minimize J under the constraints that the \alpha_i and \theta_i have to be positive.
Is this something that JuMP, the package suggested by @rveltz, could solve, or is there a specific nonlinear least squares package you can recommend?

  • An alternative way of recasting my system of equations as an optimization problem would be to use the equations as equality constraints and add the positivity requirements as further constraints, while setting the objective function to a constant (i.e., all that the optimization will do is to try to satisfy the constraints, as there is nothing to optimize in the objective function).
    Here again, is JuMP the most promising package to try this?

Finally, it seems to me that these two alternatives are mathematically equivalent problems. Is one generally faster than the other, and is it likely that I will nevertheless see different solutions because of the algorithmic differences in how they are solved?

Yup, pretty much. If you find a solver which directly provides an interface for the nlls it may be easier, otherwise take the sum of squares like that. The only this to consider that makes nlls special is that the hessian can be approximated using the Jacobian of the residuals, if you need it. I don’t remember the exact formula.

Yes, and any other constraints. Linear ones especially are useful.

Tough to know. Usually the constrained nonlinear least squares seems the best place to start. It is tough to predict which optimizer, method, and formulation work best for nasty nonconvex systems.

JuMP is a frontend interface to various optimizers providing its own specification . If your problem can be formulated with in it, then use it. For the optimizers themselves, knitro is commercial but works very well. It has a multistart options which help with global solutions

If you want some variables to be positive, say \theta, change variables \theta:=u^2 and solve in u. That seems obvious but…

1 Like

100%. @bielim if you are just worried about some parameters being positive, then this sort of thing is sufficient. As are standard tricks like converting into log space. You can also find all sorts of other transformations, and things like GitHub - tpapp/TransformVariables.jl: Transformations to contrained variables from ℝⁿ. and GitHub - TuringLang/Bijectors.jl: Implementation of normalising flows and constrained random variable transformations can help you do the variable transformations easier. If your problem is otherwise well behaved then I suspect nlsolve would be perfectly fine.

If you have a large and/or nasty and non-convex problem - then the “throw it at a serious optimizer” approach and use multistart tends be very robust. This is true even if you didn’t need these sorts of variable transformations… but is the sort of hammer you use as a last resort.

1 Like

Yes, true - I did realize that this would do the trick as long as positivity is my only type of constraint, and it actually gave okay results. I was still curious to learn about general ways of dealing with constrained nonlinear systems of equations because at some point I will probably have to deal with variations of this problem that involve different kinds of constraints (e.g., that the \theta_i are bounded by some interval [a, b]).
But I think that for the time being this is indeed the solution with the best cost/benefit ratio!

Thank you very much for your help, @Tamas_Papp, @rveltz and @jlperla! Your suggestions are great guidance for methods/packages to explore for more flexible approaches!

If you want some variables to be positive, say θ, change variables θ := u^2 and solve in u

Bad idea in general: https://journals.squ.edu.om/index.php/squjs/article/download/384/395

1 Like

I apologize for the long answer. Below are 3 experiments.

Here’s a solution based on solving a feasibility problem (minimizing a constant objective function subject to your system as constraints). This allows you to include bounds on your variables. However, it seems to me that your problem is over-constrained: 6 variables and 7 equality constraints if you include w_1 + w_2 = 1. I left this last constraint out and it appears to be (approximately) satisfied at the solution found by IPOPT:

(@v1.4) pkg> activate test-feas

(test-feas) pkg> add NLPModels NLPModelsIpopt

julia> using NLPModels, NLPModelsIpopt

julia> function c(x)
           M = [2.250, 9.675, 57.263, 427.219, 3836.109, 40234.852]
           F = [ M[1] - x[5]*x[3]*x[1] - x[6]*x[4]*x[2] ;
                 M[2] - x[5]*x[3]^2*(x[1]+1)*x[1] - x[6]*x[4]^2*(x[2]+1)*x[2] ;
                 M[3] - x[5]*x[3]^3*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^3*(x[2]+2)*(x[2]+1)*x[2] ;
                 M[4] - x[5]*x[3]^4*(x[1]+3)*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^4*(x[2]+3)*(x[2]+2)*(x[2]+1)*x[2] ;
                 M[5] - x[5]*x[3]^5*(x[1]+4)*(x[1]+3)*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^5*(x[2]+4)*(x[2]+3)*(x[2]+2)*(x[2]+1)*x[2] ;
                 M[6] - x[5]*x[3]^6*(x[1]+5)*(x[1]+4)*(x[1]+3)*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^6*(x[2]+5)*(x[2]+4)*(x[2]+3)*(x[2]+2)*(x[2]+1)*x[2] ]
                 #x[5] + x[6] - 1]  # left this out
           return F
       end

julia> x0 = [0.5; 0.5; 0.5; 0.5; 0.5; 0.5];

julia> # define a model with derivatives computed by ForwardDiff
julia> # bounds on the variables are included
julia> model = ADNLPModel(x -> 0.0, x0; c=c, lcon=zeros(6), ucon=zeros(6), lvar=zeros(6))

julia> stats = ipopt(model)  # solution found after 481 iterations

julia> x = stats.solution
6-element Array{Float64,1}:
 2.0064087361563248
 0.46019054128577414
 1.4990976942166683
 1.0902146315580712
 0.6949587215544324
 0.3183161594144738

julia> x[5] + x[6]
1.0132748809689063

Choosing a different starting point might give you the solution you’re looking for.

As an experiment, I tried formulating your problem as an over-determined nonlinear least-squares problem with bounds and include the term w_1 + w_2 - 1 in the residual. IPOPT doesn’t manage to solve it but KNITRO does (KNITRO is a commercial solver but you can download a demo version from the Artelys website):

(test-feas) pkg> add NLPModelsKnitro

julia> using NLPModelsKnitro

julia> function c(x)
           M = [2.250, 9.675, 57.263, 427.219, 3836.109, 40234.852]
           F = [ M[1] - x[5]*x[3]*x[1] - x[6]*x[4]*x[2] ;
                 M[2] - x[5]*x[3]^2*(x[1]+1)*x[1] - x[6]*x[4]^2*(x[2]+1)*x[2] ;
                 M[3] - x[5]*x[3]^3*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^3*(x[2]+2)*(x[2]+1)*x[2] ;
                 M[4] - x[5]*x[3]^4*(x[1]+3)*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^4*(x[2]+3)*(x[2]+2)*(x[2]+1)*x[2] ;
                 M[5] - x[5]*x[3]^5*(x[1]+4)*(x[1]+3)*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^5*(x[2]+4)*(x[2]+3)*(x[2]+2)*(x[2]+1)*x[2] ;
                 M[6] - x[5]*x[3]^6*(x[1]+5)*(x[1]+4)*(x[1]+3)*(x[1]+2)*(x[1]+1)*x[1] - x[6]*x[4]^6*(x[2]+5)*(x[2]+4)*(x[2]+3)*(x[2]+2)*(x[2]+1)*x[2] ;
                 x[5] + x[6] - 1]
           return F
       end

julia> model = ADNLSModel(c, x0, 7, lvar=zeros(6))  # NSL = nonlinear least squares

julia> stats = knitro(model)  # solved in 154 iterations

julia> x = stats.solution
6-element Array{Float64,1}:
 2.0014732680262566
 0.4965607369186837
 1.4997796318745336
 1.0165903327466042
 0.6988962139789188
 0.30110695603014176

Finally, you can try one of our pure Julia solvers that handles bound constraints:

(test-feas) pkg> add JSOSolvers
julia> using JSOSolvers

julia> stats = tron(model)  # solved in 1006 iterations

julia> stats.solution
6-element Array{Float64,1}:
 1.9441520958770582
 1.9441520958770582
 1.5104044760203716
 1.5104044760203716
 0.36776275007834164
 0.36776275007834164
4 Likes

I don’t think that general methods exist. Usually investing a bit in understanding the structure of the problem and coming up with either

  1. some useful transformation,
  2. a good initial guess,
  3. a nicely behaved simplified problem useful in homotopy continuation

is helpful. The more of the above, the better.

If there are just bound constraints then interval methods provide a method that can find all (isolated) roots - in prínciple.

However, the current implementation in IntervalRootFinding.jl is currently unable to solve the OP’s problem in a reasonable time.

Since this is a system of polynomial equations, you can actually compute all (isolated) solution of this system using HomtopyContinuation.jl. In your case the system has 18 (complex) solutions and usually 4 real solutions.

Here

using HomotopyContinuation

@polyvar M[1:6] x[1:6]

system = [
  M[1] - x[5] * x[3] * x[1] - x[6] * x[4] * x[2],
  M[2] - x[5] * x[3]^2 * (x[1] + 1) * x[1] - x[6] * x[4]^2 * (x[2] + 1) * x[2],
  M[3] - x[5] * x[3]^3 * (x[1] + 2) * (x[1] + 1) * x[1] -
  x[6] * x[4]^3 * (x[2] + 2) * (x[2] + 1) * x[2],
  M[4] - x[5] * x[3]^4 * (x[1] + 3) * (x[1] + 2) * (x[1] + 1) * x[1] -
  x[6] * x[4]^4 * (x[2] + 3) * (x[2] + 2) * (x[2] + 1) * x[2],
  M[5] - x[5] * x[3]^5 * (x[1] + 4) * (x[1] + 3) * (x[1] + 2) * (x[1] + 1) * x[1] -
  x[6] * x[4]^5 * (x[2] + 4) * (x[2] + 3) * (x[2] + 2) * (x[2] + 1) * x[2],
  M[6] -
  x[5] * x[3]^6 * (x[1] + 5) * (x[1] + 4) * (x[1] + 3) * (x[1] + 2) * (x[1] + 1) * x[1] -
  x[6] * x[4]^6 * (x[2] + 5) * (x[2] + 4) * (x[2] + 3) * (x[2] + 2) * (x[2] + 1) * x[2],
]

# this computes a set of start solutions which we can use to compute
# all real solutions to any M we want
start_res = monodromy_solve(system; parameters = M)

target_res = solve(
  system,
  solutions(start_res),
  parameters = M,
  start_parameters = parameters(start_res),
  target_parameters = [2.250, 9.675, 57.263, 427.219, 3836.109, 40234.852],
)

real_solutions(target_res)
4-element Array{Array{Float64,1},1}:
 [2.05385029610331, 0.4047625777040844, 1.491193381943373, 1.805587741149837, 0.6421990255934416, 0.3874267420604687]
 [2.0064087361571805, 0.4601905412775597, 1.4990976942165646, 1.0902146315738315, 0.6949587215536865, 0.318316159418627]
 [0.4047625777014698, 2.0538502961020852, 1.8055877411449965, 1.4911933819437089, 0.38742674206072847, 0.6421990255944628]
 [0.4601905412704846, 2.0064087361588445, 1.09021463159536, 1.4990976942163359, 0.31831615942240693, 0.6949587215523535]

On our homepage we also have an example of the method of moments applied to a mixture of gaussians.

6 Likes

Thank you, @dpo, this is tremendously helpful!
I installed the KNITRO demo version and performed some experiments following your examples (I kept the equation w_1+w_2=1 in the system and removed the equation for M[6] instead, so I was solving a system of 6 equations for 6 unknowns, where all unknowns had a positivity constraint). The table below summarizes the results:

Solver Model Time to solution (in s) Solution MSE
Ipopt ADNLP 11.77 [0.394, 2.045, 1.724, 1.494, 0.384, 0.651] 0.091
Ipopt ADNLS 5.30 [1.875, 1.875, 1.528, 1.528, 0.384, 0.384] 0.382
knitro ADNLP 5.45 [2.043, 0.393, 1.494, 1.704, 0.653, 0.383] 0.086
knitro ADNLS 1.74 [1.994, 0.504, 1.501, 0.962, 0.703, 0.297] 0.000
tron ADNLP 7.70 [0.500, 0.500, 0.500, 0.500, 0.500, 0.500] 0.597
tron ADNLS 4.65 [1.874, 1.874, 1.528, 1.528, 0.384, 0.384] 0.382

KNITRO is the winner, especially when applied to the formulation as a nonlinear least-squares problem. I will do some more experiments with KNITRO to get some data on how reliably it finds a close approximation to the correct solution, but so far I’m pretty impressed by it!

Given the large spread in the results, I was wondering if someone could share some insight as to what the main differences between the solvers are - do they use fundamentally different algorithms? Or is KNITRO just “smarter” in terms of choosing (maybe even on-the-fly changing) algorithms in the process of solving the problem?
When running KNITRO, I also saw the output “No start point provided – Knitro computing one”, even though I had provided a starting value. Does that mean that KNITRO is able to identify bad starting points and will proceed to find better ones if needed?

TLDR: KNITRO is the clear winner! Why is it better than the other solvers?