Differences between NLsolve and Optim in solving system of equations

amrods · May 3, 2021, 3:17pm

I am solving a system of equations using 2 methods. Solving it “directly” with NLsolve, and solving it by minimizing the norm of the residuals of the system. Here’s an example:

using NLsolve
using Optim
using LinearAlgebra
using Random

Random.seed!(9876)

const szE = 2
const szG = 2
const szA = 9
const β = 0.92
const δ = 0.10

Π = 1000 .* rand(9,2,9,2)
Sm = 1000 .* rand(9,2)
Sf = 1000 .* rand(9,2)

function maxT(am, af; Z=9)
    Z - max(am, af) + 1
end

function solvnl!(res, μ0; Sm=Sm, Sf=Sf, Π=Π)
    res2 = reshape(res, (szA, szE, szG))

    μ0′ = reshape(μ0, (szA, szE, szG))

    prodsummand = similar(μ0, (szA, szE, szA, szE))
    for idx in CartesianIndices(prodsummand)
        am, em, af, ef = Tuple(idx)
        prodsummand[idx] = Π[am,em,af,ef]*sqrt(Sm[am,em]*Sf[af,ef])*prod( abs((μ0′[am+k,em,1]*μ0′[af+k,ef,2])/(Sm[am+k,em]*Sf[af+k,ef]))^(0.5*(β*(1-δ))^k) for k in 0:(maxT(am,af; Z=9)-1) )
    end

    for idx in CartesianIndices(res2)
        am, em = Tuple(idx)
        res2[am, em, 1] = Sm[am, em] - μ0′[am,em,1] - sum( prodsummand[am,em,af,ef] for af in 1:szA, ef in 1:szE )
    end
    for idx in CartesianIndices(res2)
        af, ef = Tuple(idx)
        res2[af, ef, 2] = Sf[af, ef] - μ0′[af,ef,2] - sum( prodsummand[am,em,af,ef] for am in 1:szA, em in 1:szE )
    end
end

function solvopt!(μ0; Sm=Sm, Sf=Sf, Π=Π)
    res2 = similar(μ0, (szA, szE, szG))

    μ0′ = reshape(μ0, (szA, szE, szG))

    prodsummand = similar(μ0, (szA, szE, szA, szE))
    for idx in CartesianIndices(prodsummand)
        am, em, af, ef = Tuple(idx)
        prodsummand[idx] = Π[am,em,af,ef]*sqrt(Sm[am,em]*Sf[af,ef])*prod( abs((μ0′[am+k,em,1]*μ0′[af+k,ef,2])/(Sm[am+k,em]*Sf[af+k,ef]))^(0.5*(β*(1-δ))^k) for k in 0:(maxT(am,af; Z=9)-1) )
    end

    for idx in CartesianIndices(res2)
        am, em = Tuple(idx)
        res2[am, em, 1] = Sm[am, em] - μ0′[am,em,1] - sum( prodsummand[am,em,af,ef] for af in 1:szA, ef in 1:szE )
    end
    for idx in CartesianIndices(res2)
        af, ef = Tuple(idx)
        res2[af, ef, 2] = Sf[af, ef] - μ0′[af,ef,2] - sum( prodsummand[am,em,af,ef] for am in 1:szA, em in 1:szE )
    end

    return norm(res2)
end

init = rand(9,2,2)

nlsol = nlsolve(solvnl!, init; iterations=10_000, autodiff=:forward, method=:newton)

optsol = optimize(solvopt!, init, BFGS(), Optim.Options(iterations=10_000); autodiff=:forward)

The problem is that, while nlsolve finds a zero, optimize does not. In fact, it is strange that nlsol.zero does obtain a zero for solvopt! that Optim can’t find. Even starting very close to the zero, the solution by optimization goes somewhere else:

julia> optsol = optimize(solvopt!, nlsol.zero .+ (10 .* rand((szA, szE, szG))), ConjugateGradient(), Optim.Options(iterations=10_000, f_tol=1e-10, x_tol=1e-10, g_tol=1e-10); autodiff=:forward)
 * Status: success

 * Candidate solution
    Final objective value:     3.816617e+06

 * Found with
    Algorithm:     Conjugate Gradient

 * Convergence measures
    |x - x'|               = 0.00e+00 ≤ 1.0e-10
    |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 1.0e-10
    |g(x)|                 = 2.30e+09 ≰ 1.0e-10

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    15
    f(x) calls:    67
    ∇f(x) calls:   59

What could be happening?

cvanaret · May 3, 2021, 3:54pm

What happens if you start Optim from the solution of NLsolve?

amrods · May 3, 2021, 9:44pm

says line search failed:

julia> optsol = optimize(solvopt!, nlsol.zero, BFGS(), Optim.Options(iterations=10_000); autodiff=:forward)
 * Status: failure (line search failed)

 * Candidate solution
    Final objective value:     1.925837e-04

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 0.00e+00 ≤ 0.0e+00
    |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|         = 1.93e-04 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.00e+00 ≰ 0.0e+00
    |g(x)|                 = 1.63e+08 ≰ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    1
    f(x) calls:    87
    ∇f(x) calls:   87

cvanaret · May 3, 2021, 9:52pm

Weird, it reports that the objective value is low (1.925837e-04), but the gradient is large (norm = 1.63e+08). Usually, the line search fails when the solution of the subproblem is not a descent direction.
All in all, this suggests that the gradients are wrong

John_Gibson · May 3, 2021, 9:54pm

Are you sure your initial perturbation for the initial guess is actually small, in the case of optimization? Small relative to the second derivatives of f?

Generally it’s not a good idea to solve f(x)=0 by minimizing the norm of the residual \|f(x)\|, neither for univariate f nor f : R^n \rightarrow R^n, n>1. You’re trading finding the intersection of a curve with a line for finding the minimum of a quadratic, locally.

amrods · May 3, 2021, 10:35pm

I tried with numerical derivatives, so I don’t know how the gradient is wrong.

julia> optsol = optimize(solvopt!, nlsol.zero .+ (10 .* rand((szA, szE, szG))), ConjugateGradient(), Optim.Options(iterations=10_000, f_tol=1e-10, x_tol=1e-10, g_tol=1e-10); autodiff=:finite)
 * Status: success

 * Candidate solution
    Final objective value:     3.266168e+06

 * Found with
    Algorithm:     Conjugate Gradient

 * Convergence measures
    |x - x'|               = 0.00e+00 ≤ 1.0e-10
    |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 1.0e-10
    |g(x)|                 = 9.22e+03 ≰ 1.0e-10

 * Work counters
    Seconds run:   1  (vs limit Inf)
    Iterations:    29
    f(x) calls:    294
    ∇f(x) calls:   275

Tamas_Papp · May 4, 2021, 7:56am

Generally, don’t do that. Besides the problem mentioned by @John_Gibson, you also square the condition number. For solving a nonlinear system, use a nonlinear rootfinder.

amrods · May 4, 2021, 8:26am

Apart from NLsolve, any other package suggestions?

Tamas_Papp · May 4, 2021, 8:29am

(WIP, but I use it for my own work)

amrods · May 4, 2021, 8:38am

Fantastic, thank you! and Judd for the theory again?

John_Gibson · May 4, 2021, 10:19am

One more thing: root solvers for the n-dimensional systems f(x) = 0 utilize the n \times n derivative \partial f_i/\partial x_j, but an optimizer applied to \|f(x)\|^2 is guided by the n-dimensional gradient \nabla \| f(x) \|^2 = 2 \nabla f\cdot f or componentwise, 2\sum_j f_j \, \partial f_i/\partial x_j. You lose information about the location of the solution when you contract from the n^2-dimensional derivative (root finding) to the n-dimensional gradient (optimization), making the optimization harder.

amrods · May 4, 2021, 10:23am

I see that they mention those problems in my references. But I thought I had seen that approach before, hence why I was trying it.

Tamas_Papp · May 4, 2021, 12:02pm

I would recommend Numerical Optimization (2nd ed) by Nocedal & Wright instead for an intro. I find it intuitive, accessible, and practical.

longemen3000 · May 5, 2021, 4:23am

I use NLSolvers for this. i had this function defined,as NLSolvers provides the bare minimum:

using NLSolvers,ForwardDiff
function nlsolve(f!,x0,method=TrustRegion(Newton(), Dogleg()),options=NEqOptions())
    len = length(x0)
    xcache = zeros(eltype(x0),len)
    Fcache = zeros(eltype(x0),len)
    JCache = zeros(eltype(x0),len,len)
    jconfig = ForwardDiff.JacobianConfig(f!,x0,x0)
    function j!(J,x)
        #@show J
        ForwardDiff.jacobian!(J,f!,Fcache,x,jconfig)
    end
    function fj!(F,J,x) 
        #@show J,F
        ForwardDiff.jacobian!(J,f!,F,x,jconfig)
        F,J
    end
    
    function jv!(x)
        function JacV(Fv,v)
            ForwardDiff.jacobian!(JCache,f!,Fcache,v,jconfig)
            Fv .= Jcache * v
        end
        return LinearMap(JacV,length(x))
    end
    vectorobj = NLSolvers.VectorObjective(f!,j!,fj!,jv!)
    vectorprob = NEqProblem(vectorobj)
    res = solve(vectorprob, x0,method , options)
end

amrods · May 10, 2021, 5:44am

@longemen3000 I’m looking a the NLSolvers package. What is that jv! function? It is not described in the documentation.

longemen3000 · May 10, 2021, 5:56am

It’s a Jacobian-vector product (J*v). The idea behind this is that this operation could be written without allocating a full Jacobian (my version is naive,as it just calculates the Jacobian in an internal cache)

amrods · May 10, 2021, 6:01am

I see. I will have to read more about that. But you mentioned that it’s naive because you actually compute the Jacobian, which is supposed to be bypassed by the Jacobian-vector product operation. But, still that is quicker than computing finite differences?

longemen3000 · May 10, 2021, 6:07am

I would say yes, as there are only length(x) *k function evaluations where:

if n<=12, k= 1 (ForwardDiff evaluates all partials at once)
n > 12, k = div(n, 12)+1 (ForwardDiff evaluates the partials in chunks of twelve)
Finite differences, if I’m correct, requires around n^2 evaluations

amrods · May 10, 2021, 6:12am

Ok. One final question: that Jacobian-vector product arises in Forward mode AD, right? as I understand, it’s not part of the algorithm itself for solving the system, nor it arises in Reverse mode AD, is that correct?

longemen3000 · May 10, 2021, 6:17am

It does not come from any optimization algorithm needs, but from the need to reduce the amount of computation needed to obtain derivative information. I don’t know if reverse AD is different here

Topic		Replies	Views
Struggling to understand how NLsolve works New to Julia	2	227	April 2, 2023
Basic solve with Optim/ NLsolve Optimization (Mathematical)	1	127	January 23, 2024
Solving system of equations New to Julia	3	5514	August 20, 2019
Nlsolver vs Matlab fsolve Optimization (Mathematical) question	6	201	September 17, 2024
Nonlinear systems: NLsolve alternative (non-convergence issue) Optimization (Mathematical) question , nlsolve , nonlinear	3	1024	January 25, 2021

Differences between NLsolve and Optim in solving system of equations

Related topics