Minimize Watson test function for n=12

More or less by chance I encountered the Watson function, that is test function #20,
page 25, in the report “Testing Unconstrained Optimization Software” from 1991 by
Moré, Garbow, and Hillstrom. See
https://typeset.io/papers/testing-unconstrained-optimization-software-50e27if9r9

I was able to verify the indicated minima for n=6 and n=9 with Fortran and R software.
The case n=12 turned out to be more difficult. I got the correct result only with a
specialized function lsqnonlin() from the R package pracma.

In Julia I tried the LsqFit package. It requires a ‘model’, not just the name of a function.
I know from other experiments that it is not as accurate and powerful as, e.g., the ‘nlsr’
tool in R. ‘Optim’ with different methods did not really come close - and may not be
appropriate for least-squares minimization problems.

It is necessary to determine the minimum very accurately because the point where the
minimum is reached depends heavily on the minimum value found.

Can you suggest a solver in Julia that will return the same (or a better) minimum of the
Watson function, say in [0,1]^n for n=12 ? And what solver in Julia is recommended for
nonlinear least-squares problem (with Levenberg-Marquardt extension) in general,
when ‘LsqFit’ is not powerful enough?

PS. Implementation of Watson in Julia

    function watson(x)
        m = 29; n = length(x)
        t = zeros(Float64, m)  
        for i = 1:m
            t[i] = float(i)/m
        end
        F = zeros(Float64, m+2)
        for i = 1:m
            f11 = 0.0
            for j = 2:n
                f11 = f11 + (j-1) * x[j] * t[i]^(j-2)
            end
            f12 = 0.0
            for j = 1:n
                f12 = f12 + (x[j] * t[i]^(j-1))
            end
            F[i] = (f11 - f12^2 - 1.0)
        end
        F[m+1] = x[1]
        F[m+2] = x[2] - x[1]^2 - 1.0
        return sum(F.^2)
    end

The model in LsqFit.jl is the name of a function. It’s the function \vec{f}(\vec{x},\vec{p}) in the least-squares minimization problem

\min_{\vec{p}} \Vert \vec{f}(\vec{x},\vec{p}) - \vec{y}\Vert^2

which is what is required by a Levenberg–Marquardt type algorithm.

If you just supply the sum-of-squares objective function, that’s not enough to use the Levenberg–Marquardt algorithm. You have to call some generic nonlinear-optimization algorithm if you only have the sum of the squares.

I don’t know how you called lsqnonlin from R’s pracma with this function, since my understanding is that it emulates the lsqnonlin function from Matlab, which also requires you to specify f and not the sum of the squares.

Are you sure you’re doing an apples-to-apples comparison?

PS. A more Julian implementation of your watson function, which doesn’t allocate arrays, and computes using the numeric type of the input x rather than always using double precision (this is important if you want to use an AD tool like ForwardDiff.jl to get the gradient) is:

function watson(x)
    m = 29; n = length(x)
    F = float(zero(eltype(x)))
    for i = 1:m
        f11 = zero(F)
        t = oftype(F, i) / m
        for j = 2:n
            f11 = f11 + (j-1) * x[j] * t^(j-2)
        end
        f12 = zero(F)
        for j = 1:n
            f12 = f12 + (x[j] * t^(j-1))
        end
        F += (f11 - f12^2 - 1)^2
    end
    F += x[1]^2
    F += (x[2] - x[1]^2 - 1)^2
    return F
end

(This is still quite suboptimal, since you could e.g. compute the powers t^j iteratively in the loop rather than recomputing them from scratch on each iteration.)

You can pass the same sort of function to LsqFit in this example with e.g. a function that return a vector of the f_i(x) functions:

function watsonmodel(indices::UnitRange{<:Integer}, x)
    indices == 1:31 || throw(ArgumentError("invalid indices $indices"))
    n = length(x)
    F = Vector{float(eltype(x))}(undef, 31)
    for i = 1:29
        f11 = zero(eltype(F))
        t = oftype(f11, i) / 29
        for j = 2:n
            f11 = f11 + (j-1) * x[j] * t^(j-2)
        end
        f12 = zero(eltype(F))
        for j = 1:n
            f12 = f12 + (x[j] * t^(j-1))
        end
        F[i] = f11 - f12^2 - 1
    end
    F[30] = x[1]
    F[31] = x[2] - x[1]^2 - 1
    return F
end

In which case LsqFit.jl exactly reproduces the results from the paper:
image

julia> fit = LsqFit.curve_fit(watsonmodel, 1:31, zeros(31), zeros(6));

julia> sum(abs2, fit.resid)
0.00228767005355246

julia> fit = LsqFit.curve_fit(watsonmodel, 1:31, zeros(31), zeros(9));

julia> sum(abs2, fit.resid)
1.3997601380982367e-6

julia> fit = LsqFit.curve_fit(watsonmodel, 1:31, zeros(31), zeros(12));

julia> sum(abs2, fit.resid)
4.722381103360363e-10
2 Likes

For example, using your watson sum-of-squares function, you can use BFGS from Optim.jl. If you use my version of watson, it works with automatic differentiation to get the gradient, so you can turn on autodiff:

julia> optimize(watson, zeros(6), BFGS(), autodiff=:forward)
 * Status: success

 * Candidate solution
    Final objective value:     2.287670e-03

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 3.14e-08 ≰ 0.0e+00
    |x - x'|/|x'|          = 2.07e-08 ≰ 0.0e+00
    |f(x) - f(x')|         = 1.01e-16 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 4.42e-14 ≰ 0.0e+00
    |g(x)|                 = 1.10e-10 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    29
    f(x) calls:    81
    ∇f(x) calls:   81


julia> optimize(watson, zeros(9), BFGS(), autodiff=:forward)
 * Status: success

 * Candidate solution
    Final objective value:     1.399760e-06

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 3.02e-06 ≰ 0.0e+00
    |x - x'|/|x'|          = 7.36e-07 ≰ 0.0e+00
    |f(x) - f(x')|         = 3.71e-15 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 2.65e-09 ≰ 0.0e+00
    |g(x)|                 = 5.92e-10 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    34
    f(x) calls:    114
    ∇f(x) calls:   114


julia> optimize(watson, zeros(12), BFGS(), autodiff=:forward)
 * Status: success

 * Candidate solution
    Final objective value:     2.664248e-09

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 1.06e-04 ≰ 0.0e+00
    |x - x'|/|x'|          = 7.13e-05 ≰ 0.0e+00
    |f(x) - f(x')|         = 4.73e-14 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.78e-05 ≰ 0.0e+00
    |g(x)|                 = 1.94e-10 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    37
    f(x) calls:    136
    ∇f(x) calls:   136


julia> optimize(watson, zeros(12), BFGS(), Optim.Options(g_abstol=1e-12), autodiff=:forward)
 * Status: success

 * Candidate solution
    Final objective value:     4.722381e-10

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 2.69e-07 ≰ 0.0e+00
    |x - x'|/|x'|          = 2.62e-08 ≰ 0.0e+00
    |f(x) - f(x')|         = 1.57e-19 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 3.32e-10 ≰ 0.0e+00
    |g(x)|                 = 7.13e-13 ≤ 1.0e-12

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    42
    f(x) calls:    164
    ∇f(x) calls:   164

Note that to reproduce the results from the paper for n=12, I needed to reduce the gradient convergence tolerance, but it still requires less than 200 function calls. (warn people that g_tol often needs to be set · Issue #1102 · JuliaNLSolvers/Optim.jl · GitHub)

In general, you should expect that the appropriate choice of tolerances in any optimization or fitting problem to be specific to the problem at hand. Always blindly using the default tolerances is a common mistake (in any language, with any library).

3 Likes

Thank you for your extensive and very helpful answer. You are right, MATLAB and ‘pracma’ expect the objective function to return a vector, not a scalar. That is the reason I implemented the Watson function using a pre-allocated array. Thus I can easily change between return F and return sum(F^2).
Sorry, I was not reading the documentation of ‘LsqFit’ carefully enough. I am much impressed by how you applied Optim to this problem. Sometimes I wish, the documentation of Julia packages would go into more detail. Thanks again.