Survey of Non-Linear Optimization Modeling Layers in Julia

I am doing a survey of the NLP modeling layers in Julia to see which might be applicable to the kinds of problems that I regularly solve. I have the following basic requirements of the modeling layer,

  • Support for non-convex functions for the objective
  • Support for a system non-convex equality and inequality constraint functions (the equality constraints usually cannot be expressed explicitly as a manifold)
  • Support for polynomial and transcendental functions (e.g. x^2*y^3,sin(x))
  • Some kind of automatic differentiation system (so I don’t have to write derivative oracles by hand)

Here is what I found so far (in alphabetical order),

If I have made an error in the characterization of any listed packages corrections are greatly appreciated.

So far, it seems that ADNLPModels, GalacticOptim, JuMP, Nonconvex and Optim are the packages that currently support all of my requirements. However, if you know of some other Julia package that might be able to solve such problems, I would very much like to hear about it.

Note: original post has been revised based on info from this discussion.

17 Likes

You’re missing GalacticOptim.jl which has probably the most coverage via:

https://galacticoptim.sciml.ai/dev/

1 Like

And wide AD support:

Optim has IPNewton which supports NL constraints Optim.jl

4 Likes

What does global constrained column mean? And why does MOI not support that?

1 Like

We haven’t wrapped it yet.

I guess I don’t understand the distinction between global and local. It’s up to the solver to figure that out, not MathOptInterface.

Which algorithms Optim rely on for Global constrained and unconstrained?

Yes, MOI has both algorithms, which is why it has both boxes checked. Flux does not, so it only has one of them checked.

https://galacticoptim.sciml.ai/dev/optimization_packages/optim/#Global-Optimizer

I think he means the very last column, which is not checked for MOI.

Thanks! Now I see the difference in terminology. In MOI/JuMP, the difference between global and local is the capacity of a solver to certify (of course, there might be numerical issues) that the solution is a global optimum or only a local optimum. For GalacticOptim, it seems more about how the feasible space is explored.

In GalacticOptim, MOI is a package that is wrapped, and that wrapper does not support the constraints right now so it’s not checked.

@mohamed82008, thanks for the tip about Optim. In this example I did not see how to combine what is shown here with an AD approach for the Jacobian and Hessian. Do you know of an example doing this?

@ChrisRackauckas, I will give GalaticOptim a try going to Ipopt through the MOI backend, unless you suggestion a different one. What AD system do you recommend for sparse large scale problems? AutoModelingToolkit sounds like the best choice from the docs you post, correct?

2 Likes

You would probably need to do your own AD when defining gradient/jacobian/hessian functions. @pkofod can correct me if I am wrong. (sorry for the ping)

1 Like

Ok, an AD system is a hard requirement for me. Will wait to hear from @pkofod to confirm the status, but updating the original post to reflect new info.

@ChrisRackauckas, do you have an example of how to use GalaticOptim with constraint functions? I reviewed these docs,

https://galacticoptim.sciml.ai/stable/tutorials/intro/
https://galacticoptim.sciml.ai/stable/API/optimization_problem/
https://galacticoptim.sciml.ai/stable/API/optimization_function/

There seems to be a hint of how to specify the constraints through the cons argument to OptimizationFunction. But a specification of what this argument should be I could not find.

Also while reviewing these docs,

https://galacticoptim.sciml.ai/stable/API/optimization_function/#Defining-Optimization-Functions-Via-AD

I noticed that AutoForwardDiff is only AD system that says it supports constraints, so it seems like I should use this one instead of AutoModelingToolkit?

Not necessarily. Each has its own advantages. MTK will scalarize the equations but will generate really fast code. It won’t scale in compile time the best, but for scalar-heavy code that is big and sparse it’s really good, if it compiles in time. Otherwise ReverseDiff with tape compilation is good with similar properties, but it can segfault if the tape gets too long. If the code is heavy in linear algebra, Zygote is a good bet. Tracker is kind of an in-between Zygote-ish thing that can work in some cases where Zygote doesn’t. Forward-mode doesn’t scale as well.

Yes, we should probably add a cons diff overload to MTK. It’s only like 10 lines.

@ChrisRackauckas, I found an example in the tests here, https://github.com/SciML/GalacticOptim.jl/blob/master/test/rosenbrock.jl#L30

Not sure if I understand the question, but here is a simple example of AD using the Optim ecosystem:

using NLsolve
import NLsolve.NLSolversBase: OnceDifferentiable, TwiceDifferentiable

# Beale function:
B(x) = (1.5 - x[1] + x[1].*x[2]).^2 + (2.25 - x[1] + x[1].*x[2].^2).^2 + (2.625 - x[1] + x[1].*x[2].^3).^2
## Himmelblau function:
HM(x) = (x[1].^2 + x[2] - 11).^2 + (x[1]+ x[2].^2 - 7).^2
# Rastrigin function:
RS(x) = 10*2 + x[1].^2 + x[2].^2 - 10*cos(2*pi*x[1]) - 10*cos(2*pi*x[2]);
# Rosenbrock function
R(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2

## Define the test function:
testfun = RS

## Auto-differentiation:
x0 = [10.0; 10.0] # initial point
dfn = TwiceDifferentiable(testfun,x0);

# Find the zeros of the gradient and the iterate states:
base_solver_results = nlsolve(dfn.df, x0,
        method=:newton,
        show_trace=true, store_trace=true, extended_trace=true)

# We can get the vector residual function as the gradient of the test function as:
x_init = zero(x0)
testDiffFun = OnceDifferentiable(dfn.df,x_init,x_init);