Is there a Julia equivalent of scipy.optimize.minimize(method='TNC')?

which is a truncated Newton (TNC) algorithm, see here for details: https://docs.scipy.org/doc/scipy/reference/optimize.minimize-tnc.html#optimize-minimize-tnc.

I have a box constrained optimization problem which requires more evaluations with Optim.jl + BFGS (or any other algorithms) than this Python TNC algorithm. Does anyone know if there is a Julia implementation of this algorithm or a binding (similar to the Python one which is just a binding of a C implementation)?

What does truncated Newton do? You can modify the line search algorithm to Newton in Optim to bound the step size etc. You can also use Newton with a trust region.

1 Like

Also worth looking into: is Optim.jl trying to hit a higher accuracy threshold that SciPy?

1 Like

Do you mind testing tron from JSOSolvers.jl?

using NLPModels, JSOSolvers
nlp = ADNLPModel(
    x -> (x[1] - 1)^2 + 100 * (x[2] - x[1]^2)^2, # objective
    [0.1; 0.1],  # x0
    zeros(2),    # lvar
    ones(2)      # uvar
)
output = tron(nlp)
3 Likes

To this respect, I am very confusing about how to set the accuracy. There are x_tol, f_tol, g_tol, but also *_abstol and *_reltol, also outer_x_tol, outer_f_tol, outer_g_tol and there abs and rel counter parts. Which one controls the final accuracy? The doc seems of little help.

Thanks. I will try it. BTW, atol is for function value or the variables? If atol is for function value, can I control the accuracy of the variables instead?

I have tried it but realized soon that ADNLPModel using ForwardDiff to compute grad but my objective function is an external C++ function. The C++ function also computes the accompanying grad with the objective function value. So, is there an interface to input user-provided grad?

atol and rtol are fot the first-order condition – projected gradient norm. Our other stopping criteria are max_eval and max_time, and it could stop if the line search step is too small.

Regarding user-provided functions, you have to a create a manual type, as shown below. Notice that tron requires obj, grad! and hprod!, so all three have to be implemented.

struct MyProblem <: AbstractNLPModel
  meta :: NLPModelMeta
  counters :: Counters
end

function MyProblem()
  meta = NLPModelMeta(
    2, # nvar
    x0 = [0.1; 0.1],
    lvar=zeros(2),
    uvar=ones(2)
  )
  MyProblem(meta, Counters())
end

function NLPModels.obj(:: MyProblem, x :: AbstractVector)
  (x[1] - 1)^2 + 100 * (x[2] - x[1]^2)^2
end

function NLPModels.grad!(:: MyProblem, x :: AbstractVector, g :: AbstractVector)
  g[1] = 2 * (x[1] - 1) - 400 * x[1] * (x[2] - x[1]^2)
  g[2] = 200 * (x[2] - x[1]^2)
  g
end

function NLPModels.hprod!(:: MyProblem, x :: AbstractVector, v :: AbstractVector, Hv :: AbstractVector; obj_weight=1.0)
  Hv[1] = obj_weight * (2 - 400 * (x[2] - x[1]^2) + 800 * x[1]^2) * v[1] - 400obj_weight * x[1] * v[2]
  Hv[2] = 200obj_weight * v[2] - 400obj_weight * x[1] * v[1]
  Hv
end

nlp = MyProblem()

output = tron(nlp)
2 Likes

Thanks for the detailed reply. However, it is infeasible to compute the hessian. I am afraid I cannot try your approach.

We have an alternative to use LBFGS instead of the Hessian, shown below, but notice that the Truncated Newton method is a second-order method, so the following is not a Newton method anymore. The main difference is we remove hprod and add nlp = LBFGSModel(MyProblemNoHess()).

using NLPModels, JSOSolvers

struct MyProblemNoHess <: AbstractNLPModel
  meta :: NLPModelMeta
  counters :: Counters
end

function MyProblemNoHess()
  meta = NLPModelMeta(
    2, # nvar
    x0 = [0.1; 0.1],
    lvar=zeros(2),
    uvar=ones(2)
  )
  MyProblemNoHess(meta, Counters())
end

function NLPModels.obj(:: MyProblemNoHess, x :: AbstractVector)
  (x[1] - 1)^2 + 100 * (x[2] - x[1]^2)^2
end

function NLPModels.grad!(:: MyProblemNoHess, x :: AbstractVector, g :: AbstractVector)
  g[1] = 2 * (x[1] - 1) - 400 * x[1] * (x[2] - x[1]^2)
  g[2] = 200 * (x[2] - x[1]^2)
  g
end

nlp = LBFGSModel(MyProblemNoHess())

output = tron(nlp)
3 Likes

Yeah, it works! And it requires less evals than Optim.jl. It would be great if you can address following two minor points:

(1) My external C++ funciton is a very expensive function, and it returns the objective value and gradients in a single call:

f, g[1], g[2] = CppFunc(x1, x2, c, model, config)

Can tron take advantage of this special structure of my problem to reduce the function call? Currently, I wirte

function NLPModels.obj(:: MyProblemNoHess, x :: AbstractVector)
  f, _, _ = CppFunc(x[1], x[2], c, model, config)
  return f
end

function NLPModels.grad!(:: MyProblemNoHess, x :: AbstractVector, g :: AbstractVector)
  _, g[1], g[2] = CppFunc(x[1], x[2], c, model, config)
  return g
end

However, if the input x is the same, the funciton shall be evaluated only once to save computation time.

(2) As you can see in the above demo code, I need some extra parameters c, model, config to call the function. I don’t like these variables lives in the global. But when I put the NLPModels.obj and NLPModels.grad! definitions into a function, Julia will complain: ERROR: LoadError: syntax: Global method definition around /path/to/test.jl:273 needs to be placed at the top level, or use “eval”. Is there an elegant way to define these function so they can take local parameters?

I’m happy to know it’s working, thanks for using it!

(1) Good point. I’ve created a pull request enabling that https://github.com/JuliaSmoothOptimizers/JSOSolvers.jl/pull/41
To enable call tron with keyword use_only_objgrad=true. It will be released on 0.4.3.

(2) This extra information should probably go inside MyProblemNoHess, we do similar things in some of our models.
Untested example:

struct MyProblemNoHess <: AbstractNLPModel
  meta :: NLPModelMeta
  counters :: Counters
  c
  model
  config
end

function MyProblemNoHess()
  meta = NLPModelMeta(
    2, # nvar
    x0 = [0.1; 0.1],
    lvar=zeros(2),
    uvar=ones(2)
  )
  # Get/define/create c, model and config
  MyProblemNoHess(meta, Counters(), c, model, config)
end

function NLPModels.obj(nlp :: MyProblemNoHess, x :: AbstractVector)
  f, _, _ = CppFunc(x[1], x[2], nlp.c, nlp.model, nlp.config)
  return f
end

function NLPModels.grad!(nlp :: MyProblemNoHess, x :: AbstractVector, g :: AbstractVector)
  _, g[1], g[2] = CppFunc(x[1], x[2], nlp.c, nlp.model, nlp.config)
  return g
end

Other examples include AugLagModel in Percival.jl, where we keep the Lagrangian multipliers inside the model as well.

2 Likes

Looking forward to the new version. The approach for the (2) point works great. Thanks for your help!

Sad to hear that it’s taking longer than TNC (for which I’m assuming you’re then using finite difference hessians?). I supposed it’s not possible to have a look at the actual function? Or even just the output of the different algorithms.

The box constrained optimizer in Optim is… not optimal. It uses a log-barrier term to enforce the constraints and I agree with the observation that it tends to require more evaluations than “necessary”. I intend to add some things here in the future, but I’m happy you found a solution to your problem with JSO :slight_smile:

2 Likes

Optim.jl is also a great package. I use it with Brent() for univariate optimization in the same project.

I also want to further test Optim.jl for multivariate optimization. Can you point to me where can I find the details for setting tolerance correctly, especially for box constrained problem using Fminbox, since there are two sets of tolerance tol and outer_tol. If I wish to obtain a solution (input variable x) within a given accuracy (say 1e-3), how can I set the tolerances? Thanks!

The new version is out, let me know how it goes.

I have tested Optim.jl + BFGS again with g_tol = 1e-4 and outer_g_tol = 1e-4 and JSO with atol = 1e-4 . And it seems they both take the same number of evaluations. My previous test of Optim.jl setting x_tol = 1e-4 and outer_x_tol = 1e-4 which requires more evalutions.

I have tried the new version. However, it seems no improvements by using objgrad!. I suspect that there are too many cases during optimization which evaluate objgrad! for obj or grad value only. Following are the traces. I’ve put a println inside the core function. Each objgrad! has to call F twice to construct both obj and grad values.

result = tron(nlp, use_only_objgrad=true, atol=1e-4)

------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
[ Info:   iter      f(x)         π         Δ     ratio         cgstatus  
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
[ Info:      0   5.2e+00   1.2e-01   1.0e+00   8.7e-01  stationary point found
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
[ Info:      1   5.2e+00   1.4e-02   1.2e-01   1.6e+00  stationary point found
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
[ Info:      2   5.2e+00   8.3e-03   1.2e-01   1.2e+00  stationary point found
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
[ Info:      3   5.2e+00   1.6e-03   1.2e-01   1.2e+00  stationary point found
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
[ Info:      4   5.2e+00   2.7e-04   1.2e-01   1.1e+00  stationary point found
[ Info:      5   5.2e+00   2.3e-05   1.2e-01
"Execution stats: first-order stationary"

result = tron(nlp, use_only_objgrad=false, atol=1e-4)

------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
[ Info:   iter      f(x)         π         Δ     ratio         cgstatus  
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
[ Info:      0   5.2e+00   1.2e-01   1.0e+00   8.7e-01  stationary point found
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
[ Info:      1   5.2e+00   1.4e-02   1.2e-01   1.6e+00  stationary point found
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
[ Info:      2   5.2e+00   8.3e-03   1.2e-01   1.2e+00  stationary point found
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
[ Info:      3   5.2e+00   1.6e-03   1.2e-01   1.2e+00  stationary point found
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
[ Info:      4   5.2e+00   2.7e-04   1.2e-01   1.1e+00  stationary point found
------------ F IS EVALUATED ONCE ------------
------------ F IS EVALUATED ONCE ------------
[ Info:      5   5.2e+00   2.3e-05   1.2e-01
"Execution stats: first-order stationary"

I forgot to say you should update your code to provide objgrad! instead of obj and grad!:

function NLPModels.objgrad!(:: MyProblemNoHess, x :: AbstractVector, g :: AbstractVector)
  f, g[1], g[2] = CppFunc(x[1], x[2], c, model, config)
  return f, g
end
1 Like

Otherwise objgrad defaults to calling obj and grad.

1 Like