Best way to improve local minimas in nonlinear optimization?

I’m trying to reduce the number of times NLP is returning a local minima instead of global minima (I know theoretically this is impossible if the objective is difficult enough). Currently I’m just doing a for-loop where I rerun the model with a different initialization of variables (below). Is there a better way to do this?

runs = []

for i in range(1, restarts)
  model = Model(Ipopt.Optimizer)
  
  @variable(model, 1>= x[1:4] >= 0)
  x0 = rand(.......
  set_start_value.(x, x0)
  
  @constraint(model, ........)
  
  # my_objective(x...) = .........
  
  register(model, :mobj, length(x), my_objective; autodiff = true)
  @NLobjective(model, Min, mobj(x...))
  
  optimize!(model)
  push!(runs, model)
end

select_best_run(runs)

Why not just save the solutions? No need to rebuild the model every time.

runs = Any[]
for i in 1:restarts
    x0 = rand(4)
    set_start_value.(x, x0)
    optimize!(model)
    push!(runs, value.(x))
end

How does scoping work for variable x? I tried putting this for-loop into a helper function but x wasn’t avaiable. So I tried passing x in as an argument but it seems to have been converted to a Float64?

using JuMP, Ipopt
runs = Any[]
model = Model(Ipopt.Optimizer)
@variable(model, 0 <= x[1:4] <= 1)
my_objective(x...) = sum(sin((x[i] - i) / 10) for i in 1:4)
register(model, :mobj, length(x), my_objective; autodiff = true)
@NLobjective(model, Min, mobj(x...))
for i in 1:5
    x0 = rand(4)
    set_start_value.(x, x0)
    optimize!(model)
    push!(
        runs, 
        (
            obj = objective_value(model), 
            x = value.(x),
            x0 = x0,
        )
    )
end
obj, index = findmin(r -> r.obj, runs)
runs[index]
3 Likes

When using gradient-based methods (such as Ipopt) on a nonconvex problem, the best you can do is multistart, as my colleagues suggested.

Alternatively, since your problem seems small, you can use interval methods: https://github.com/JuliaIntervals/IntervalOptimisation.jl
These are global methods that explore the whole search space and produce the global minimum. They’re significantly more expensive than local methods, but on a small problem, you can afford it.

1 Like

Thank you for the suggestion! We’re thinking about scaling up the problem so it may not be feasible. But it could be useful in a future setting. What’s the asymptotic run-time of interval methods? At which point does it become less efficient than N restarts of gradient-methods?

Thank you! This example is very useful for me and I’m sure others who find this post. For completeness of understanding, then based on how variable scopes work in JuMP, there’s no way to throw the for-loop into a helper function and access x?

The timing of this question is almost too perfect. I have been working on a paper on this with my colleague @yijiangh and the draft is almost ready and will be made public in the next few days. Will post it here then so check back in a few days.

In the worst case, global methods have an exponential complexity, because they (usually) do branch and bound. However, you get the optimal solution with a certificate of global optimality (robust to roundoff errors for interval methods). If your problem ends up large, you can also try out classical global methods, such as BARON and Couenne.

Doing multistart raises the probability of obtaining the global minimum, but it really gives you no guarantee at all in this regard. All (good) local methods can guarantee is local optimality (or local optimality of the constraint violation if the problem is locally/globally infeasible).

But there are potentially better ways to multistart than just sampling starting points uniformly.

My favorite algorithm for this is MLSL, which has a method to “filter” the starting points that often lets you avoid re-optimizing the same local optimum over and over (with some provable asymptotic guarantees, so it’s not completely heuristic). In low dimensions (here @Kevin_Shen1 is in 4) I’ve had very good results with it.

There are other methods as well, e.g. DIRECT-like branch-and-bound methods that systematically subdivide the parameter space and then do a local optimization in different subregions.

One thing to keep a close eye on is the tolerance of the local optimizer. You often want a fairly coarse tolerance for convergence when doing multistart. Then, at the very end, take your best local optimum and run local optimization again with a fine tolerance to “polish” the optimum.

5 Likes

Thanks this is what I was hoping for! What local optimizer and tolerances you use for the “global optimization” and the final local optimization parts? Are you using NLopt? I do see MLSL is implemented in that package.

If you happen to have a code snippet, that’d be very helpful as well!

Yes, we’ve always used the NLopt implementation of MLSL (e.g. here); but it’s not a very complicated algorithm if someone wants to implement a pure-Julia version. (The hardest part is the nearest-neighbor search in d dimensions, but there are already packages for this in Julia. The original MLSL algorithm is for box-constrained optimization only, but there are nonlinearly constrained variants, e.g. here and here.)

YMMV on local solvers, but the usual advice holds — if you can possibly get an analytical gradient (either by AD or manually), you should use a local-search algorithm (e.g. LBFGS) that exploits it.

Here you go. I hope you can find it useful. Any feedback is appreciated :slight_smile:

6 Likes

This looks like a nice approach to get diverse local minima. Is it also applicable to nonlinear inequality constraints without introducing slack variables? (It would be nice if there were an automated way to choose the dimensionful hyperparameters r and y.)

But if you iterate it over and over like a multistart algorithm, it looks like you aren’t guaranteed to find a global optimum eventually, since the deflation penalty may “push the optimizer away from other nearby locally optimal solutions”? So it’s not strictly comparable to a multistart algorithm like MLSL (which, in theory, eventually finds the global optimum, although it doesn’t provide any certificate of optimality).

2 Likes

Yes.

Correct. But the target users of this approach typically don’t care about global optimality as much as they care about exploring multiple diverse local optima.

1 Like

Very noobie question but why is NLopt complaining,

ERROR: LoadError: ArgumentError: invalid NLopt arguments: invalid algorithm for constraints?

model = Model(NLopt.Optimizer)
set_optimizer_attribute(model, "algorithm", :G_MLSL_LDS)
set_optimizer_attribute(model, "local_optimizer", :LD_LBFGS)

A = [
    1 1 9 5
    3 5 0 8
    2 0 6 13
]

b = [7; 3; 5]

c = [1; 3; 5; 2]

@variable(model, 0<= x[1:4] <=1000)
set_start_value.(x, [1.0, 1, 1, 1])
start_value.(x)

@constraint(model, A * x .== b)

identity(x) = x  # the internal functions can be on vectors
C(x, c) = c' * x

myobj2 = function(x...)
    return c' * collect(x)  # or it's just sufficient to collect inside the function, main thing is you need to
    # tell JuMP the length of the parameters beforehand
end

register(model, :mobj, length(x), myobj2; autodiff = true)
@NLobjective(model, Min, mobj(x...))

JuMP.optimize!(model)

L-BFGS is a method for solving unconstrained problems. For bound constrained problems, there exists L-BFGS-B (no idea if it’s available though).
For general constraints, you should use something else (e.g. Ipopt, SNOPT, filterSQP, Knitro, etc).

I get the same error with SLSQP: set_optimizer_attribute(model, "local_optimizer", :LD_SLSQP). However, I can optimize with SLSQP alone set_optimizer_attribute(model, "algorithm", :LD_SLSQP) without MLSL. Is this expected?

:LD_LBFGS in NLopt supports bound constraints. But it doesn’t support equality constraints.

The real problem is probably that MLSL only supports bound constraints (not equality or nonlinear constraints).

(Maybe, since A matrix is so small, you can explicitly eliminate this constraint through a change of variables.)

For constraints one can easily project on, ProximalAlgorithms implements a couple of algorithmic schemes that rely on L-BFGS to solve constrained problems, see for example: https://juliafirstorder.github.io/ProximalAlgorithms.jl/dev/implemented_algorithms/#ProximalAlgorithms.PANOCIteration

(Note that the documentation only refers to the master branch, and I’m currently working on a couple of breaking changes there to improve things)