How to use specific convergence measures in Optim.jl?

I’m using this code:

results = Optim.optimize(
    obj, Optim.TwiceDifferentiableConstraints(lo, hi), θ0,
    Optim.IPNewton()
)

And getting this output:

 * Status: success

 * Candidate solution
    Final objective value:     -3.375589e+01

 * Found with
    Algorithm:     Interior Point Newton

 * Convergence measures
    |x - x'|               = 0.00e+00 ≤ 0.0e+00
    |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|         = 1.02e+00 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 3.01e-02 ≰ 0.0e+00
    |g(x)|                 = 7.72e+04 ≰ 1.0e-08

 * Work counters
    Seconds run:   7  (vs limit Inf)
    Iterations:    5
    f(x) calls:    94
    ∇f(x) calls:   94

As you can see, only the “algorithm looks stuck in the x domain” measures are satisfied, yet |f(x) - f(x')| and |f(x) - f(x')|/|f(x')| are really off. (I think the gradient being really long (|g(x)| = 7.72e+04) is OK in constrained problems?)

Is it possible to use only a subset of these convergence measures? Say, I want to make sure that |f(x) - f(x')|/|f(x')| ≤ 0.0e+00 is always satisfied. How do I do that?

It looks you might be looking to set x_abstol and x_reltol to a larger non-zero value and f_reltol much smaller.

Also verify that g_converged is true: https://julianlsolvers.github.io/Optim.jl/stable/#user/minimization/#notes-on-convergence-flags-and-checks

In general, you have these options available (cf. Optim.jl), but note the comment: values 0 and NaN indicate unlimited :

help?> Optim.Options
  Configurable options with defaults (values 0 and NaN indicate unlimited):

  x_abstol::Real = 0.0,
  x_reltol::Real = 0.0,
  f_abstol::Real = 0.0,
  f_reltol::Real = 0.0,
  g_abstol::Real = 1e-8,
  g_reltol::Real = 1e-8,
...

Lastly, it is hard to help without a working example. Inferring from the code, you have a single objective function without explicit gradient information and a box constraint, so the better choice might be to simply use

inner_optimizer = Optim.LBFGS()
results = Optim.optimize(
    obj, lo, hi, θ0, Optim.Fminbox(inner_optimizer)
)

See:

2 Likes

Hmm, so I set x_abstol and x_reltol to a larger non-zero value and f_reltol much smaller. Now Optim reports success even when none of the convergence measures reported are satisfied.

(BTW, I have a different objective now, and inequality constraints, so box minimization won’t cut it, I think)

Code

res = let
    objective = θ -> -log_likelihood(θ, df.price, df.info)
    
    # Bounds
    lo = [[0, -Inf, -Inf, 1e-10, 1e-10]; zeros(6);]
    hi = [[1, Inf, Inf, Inf, Inf]; fill(Inf, 6);]
    
    function stationarity!(c, θ)
        p, mu0, mu1, c0, c1, s0, s1, v0, v1, n0, n1 = θ
        
        c[1] = 1 - s0 - v0
        c[2] = 1 - s1 - v1
        c
    end
    
    function stationarity_jac!(jac, θ)
        # Diff constr 1
        jac[1, 1:5] .= 0 # w.r.t. p, mu0, mu1, c0, c1
        jac[1, 6] = -1  # w.r.t. s0
        jac[1, 7] = 0   # w.r.t. s1
        jac[1, 8] = -1  # w.r.t. v0
        jac[1, 9] = 0   # w.r.t. v1
        jac[1, 10:end] .= 0  # w.r.t. n0, n1
        
        # Diff constr 2
        jac[2, 1:5] .= 0 # w.r.t. p, mu0, mu1, c0, c1
        jac[2, 6] = 0   # w.r.t. s0
        jac[2, 7] = -1  # w.r.t. s1
        jac[2, 8] = 0   # w.r.t. v0
        jac[2, 9] = -1  # w.r.t. v1
        jac[2, 10:end] .= 0  # w.r.t. n0, n1
        
        jac
    end
    
    function stationarity_hess_prod!(h, x, λ)
        h .= 0
        h
    end
    
    # These are equality constraints
    constr_lo = constr_hi = [0., 0.]
    
    constr = Optim.TwiceDifferentiableConstraints(
        stationarity!, stationarity_jac!, stationarity_hess_prod!,
        lo, hi, constr_lo, constr_hi
    )
    
    Optim.optimize(
        objective, constr, x0, Optim.IPNewton(),
        Optim.Options(
            iterations=5000,
            x_tol=1e-8, f_tol=1e-6,
            f_reltol=1e-8, f_abstol=1e-6,
            x_reltol=1e-8, x_abstol=1e-6,
            show_trace=true, show_every=20
        );
        autodiff=:forward
    )
end

Output

Iter     Lagrangian value Function value   Gradient norm    |==constr.|      μ
     0   1.755735e+04     -1.150427e+02    9.028653e+03     1.767238e+04     1.09e-04
 * time: 1.635028405396512e9
    20   1.754711e+04     -1.152350e+02    9.033546e+03     1.766235e+04     6.54e-05
 * time: 1.635028405473314e9
    40   1.754707e+04     -1.152537e+02    9.033518e+03     1.766232e+04     6.57e-05
 * time: 1.635028405484916e9
    60   1.754703e+04     -1.152714e+02    9.033493e+03     1.766230e+04     6.59e-05
 * time: 1.635028405497786e9
    80   1.754702e+04     -1.152913e+02    9.036839e+03     1.766228e+04     6.01e-04
 * time: 1.635028405512073e9
 * Status: success

 * Candidate solution
    Final objective value:     -1.153089e+02

 * Found with
    Algorithm:     Interior Point Newton

 * Convergence measures
    |x - x'|               = 6.01e-05 ≰ 1.0e-08
    |x - x'|/|x'|          = 6.95e-05 ≰ 1.0e-08
    |f(x) - f(x')|         = 1.48e-03 ≰ 1.0e-06
    |f(x) - f(x')|/|f(x')| = 1.29e-05 ≰ 1.0e-06
    |g(x)|                 = 2.71e+03 ≰ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    91
    f(x) calls:    618
    ∇f(x) calls:   618

Also, I didn’t specify any criteria considering the gradient (none of the g_abstol and g_reltol options), but the upper bound for |g(x)| somehow became 1e-8?

???

Optim.x_converged(res), Optim.f_converged(res), Optim.g_converged(res) returns (false, true, false), so Optim.f_converged(res) == true, even though none of the reported convergence measures are even remotely close to the desired values.

EDIT: I changed the initial point and got a situation where one of the reported convergence measures is satisfied, but the algorithm reports failure anyway:

 * Status: failure (reached maximum number of iterations)

 * Candidate solution
    Final objective value:     -1.289666e+02

 * Found with
    Algorithm:     Interior Point Newton

 * Convergence measures
    |x - x'|               = 3.12e-05 ≰ 1.0e-08
    |x - x'|/|x'|          = 3.43e-05 ≰ 1.0e-08
    |f(x) - f(x')|         = 1.04e-04 ≰ 1.0e-06
    |f(x) - f(x')|/|f(x')| = 8.07e-07 ≤ 1.0e-06
    |g(x)|                 = 1.42e+03 ≰ 1.0e-08

 * Work counters
    Seconds run:   3  (vs limit Inf)
    Iterations:    5000
    f(x) calls:    63273
    ∇f(x) calls:   63273

What’s going on here? What’s the point of these convergence measures if the algorithm doesn’t seem to take them into consideration?

Sorry, I don’t think I can post a MCVE because I’m doing maximum-likelihood estimation, so creating a MCVE would mean posting all the data, but it’s kinda confidential…

The extra information and testing is useful but not conclusive.
(Keeping in mind that I am not well-versed in the full Optim.jl design but…)

Note that x_tol and x_abstol are apparently equivalent settings, with it preferable only to set one of them, such as x_abstol, since x_tol will overwrite it (as seen in your example), similarly f_tol and f_reltol (note the rel) are equivalent with the latter best to set directly; also g_tol and g_abstol.

It seems that the IPNewton() method just ignores absolute function differences i.e. f_abstol is irrelevant:

as similarly x_reltol is also silently ignored in favour of x_abstol:

Also, the convergence criteria for this method is either as satisfaction of x_abstol, or g_abstol (aka g_tol), or f_reltol is satisfied by successive absolute function convergence:

It may be that you are getting convergence due to successive close function values as seen in the last line 69, and here:

The note specific to IPNewton() says:

Note: For constrained optimization problems, we recommend
always enabling allow_f_increases and successive_f_tol in the options passed to optimize.
The default is set to Optim.Options(allow_f_increases = true, successive_f_tol = 2).

Hence you can try out setting those above in your Options but also try setting successive_f_tol to a higher number (5 or 10); and set show_every=1 to really get a sense of what is changing at the end.

Lastly, since you are wanting to do constrained optimization, I can highly recommend that you consider using JuMP with the Ipopt solver and follow this MLE example:
https://jump.dev/JuMP.jl/stable/tutorials/Nonlinear%20programs/mle/

1 Like