Best Nonlinear Optimizer for a Continuous Convex Problem

Hi,

I have a simple problem in which I want to

\max_{x \in \mathbb{R}^{N}} f(x) \text{s.t}\hspace{5pt}x \geq 0

where f(x) is continuous and globally concave. Which is the best solver for nonlinear, globally concave maximization problems? I do not have constraints, besides non-negativity constraints. The only problems are that

(1) N is very large (N should be around 20,000)
(2) The solution might have a large number of zeros.

I have analytical gradients and Hessians!

The reason speed is of first order importance for me is that I have to optimize the function many many many times for different configurations of parameters.

Also, maybe I do not know, what the definition of sparsity is in this context. In the optimum there will be a large amount of 0s.

What is the recommendation?

I am adding a MWE below.

As you can see, I am maximizing the following function

\max_{x_{ij}} \sum_{j=1}^{J} (\sum_{i=1}^{I}x_{ij})^{\frac{p-1}{p}} - \sum_{i=1}^{I}(\sum_{j=1}^{J}t_{ij}x_{ij})^{\frac{1}{k}}

with p>1 and k<1. Hence, 0<(p-1)/p<1 and 1/k>0. The derivative of the typical element x_{ij} is

\frac{\partial f}{\partial x_{ij}} = \frac{(p-1)}{p}(\sum_{i=1}^{I}x_{ij})^{-\frac{1}{p}}-\frac{t_{ij}}{k}(\sum_{j=1}^{J}t_{ij}x_{ij})^{\frac{1-k}{k}}

Moreover, notice I define two other objects:

X_{j} = \sum_{i=1}^{I} x_{ij}
Y_{i} = \sum_{j=1}^{J} t_{ij}x_{ij}

Clearly, it is easy to compute the Hessian, which will be 0 for many entries. Although, it is going to be a pretty large matrix.

\frac{\partial^{2}f}{\partial x_{ij}^{2}} = -\frac{(p-1)}{p^{2}} (\sum_{i=1}^{I}x_{ij})^{-\frac{1-p}{p}} -\frac{(1-k)}{k^{2}}t_{ij}^{2}(\sum_{j=1}^{J}t_{ij}x_{ij})^{\frac{1-2k}{k}}

\frac{\partial^{2}f}{\partial x_{ij}x_{i^{\prime}j^{\prime}}} = 0

\frac{\partial^{2}f}{\partial x_{ij} x_{ij^{\prime}} } = -\frac{(1-k)}{k^{2}}t_{ij}t_{ij^{\prime}}(\sum_{j=1}^{J}t_{ij}x_{ij})^{\frac{1-2k}{k}}

\frac{\partial^{2}f}{\partial x_{ij}x_{i^{\prime}j}} = -\frac{(p-1)}{p^{2}} (\sum_{i=1}^{I}x_{ij})^{-\frac{1-p}{p}}

using JuMP, Ipopt, LinearAlgebra, Random, Gurobi, BenchmarkTools, MosekTools, NLopt

const Ι = 250  #This is Greek Letter Iota
const J = 50
const p = 5
const k = 0.75

function distmat(J,Ι)
    Distances = zeros(J,Ι)
    Random.seed!(7777)
    coordinates_x = hcat([x for x = 1:J],fill(0.0,J))
    coordinates_y = hcat([x for x = 1:Ι].*J/Ι,fill(0.0,Ι))
    for j = 1:J, l=1:Ι
        Distances[j,l] = sqrt((coordinates_x[j,1]-coordinates_y[l,1])^2+(coordinates_x[j,2]-coordinates_y[l,2])^2)
    end
    return 1 .+ .1*Distances
end

const t = distmat(J,Ι)


function solution_numerical_Ipopt(p,k,t)

    (J,Ι) = size(t)

    primal = Model(Ipopt.Optimizer)
    set_silent(primal)
    set_optimizer_attribute(primal, "nlp_scaling_method", "none")
    @variable(primal, x[1:J,1:Ι] >= 0)
    @NLobjective(primal, Max, 
    sum( sum(x[j,i]  for i = 1:Ι)
    ^((p-1)/p)  for j = 1:J) - sum(  sum(  t[j,i]*x[j,i]    for j =1:J )^(1/k)     for i = 1:Ι) )
    JuMP.optimize!(primal)
    
    println(objective_value(primal))
    
    sol = value.(x)

    return sol

end    


sol_Ipopt = @time solution_numerical_Ipopt(p,k,t) ;



sum(sol_Ipopt,dims=2)


function func_grad(x,p,k,t)

    (J,Ι) = size(t)
    X = sum(sol_Ipopt,dims=2)
    Y = sum(t.*sol_Ipopt,dims=1)'
    res = sum(X.^((p-1)/p))   - sum(Y.^(1/k))
    grad = zeros(J,Ι)

    for i= 1:Ι
        for j = 1:J
            grad[j,i] = (p-1)/p * X[j]^(-1/p) - t[j,i]/k * Y[i]^((1-k)/k)
        end
    end   
    
    return res, grad

end    

(f,g) = func_grad(sol_Ipopt,p,k,t)
1 Like

solve f(abs(x)) to enforce x>=0 and use BGFS/LGBFS?

1 Like

You can for instance use the algorithm tron from JSOSolvers.jl, see doc JSOSolvers.jl designed for bound-constrained problems.
Additionally, the algorithm is matrix-free so this should be great for large problems.

1 Like

No, don’t make the derivative discontinuous!

Just use a gradient-based optimizer that supports bound constraints. For example, the L-BFGS solver in NLopt supports this.

(Your N can’t be very large if you have the Hessian… N=1000 is not considered large these days.)

5 Likes

It depends.

If you can express f using Convex.jl’s domain specific language, then doing that and using a convex program solver is likely to be fastest.

Oscar’s suggestion to transform to an unconstrained problem is somewhat reasonable, but unlikely to be the best solution. Better would be to use Optim’s Fminbox to make any other Optim.jl algorithm compatible with box constraints. Since you have the Hessian, you may want to try Newton’s method in addition to BGFS. For large enough N, switching to a Newton-Krylov method would make sense (there’s one in Optim.jl, maybe only in the dev branch, but it’s largely undocumented).

1 Like

Since you have the hessian, you can try Bertsekas (1982) projected Newton in NLSolvers.jl It’s called ActiveBox. It requires New version: NLSolvers v0.2.0 by JuliaRegistrator · Pull Request #56457 · JuliaRegistries/General · GitHub to merge before you can pull it, otherwise clone the repo.

The package has code that’s supposed to replace Optim.jl’s code, but it’s not quite there. Not quite stable or finished, but you can give it a shot. Should work something like

function f(x)
    fx = ...
    return fx
end
function g!(∇f, x)
    ...update ∇f...
    return ∇f
end
function h!(∇²f, x)
    ... update ∇²f...
    return ∇²f
end
function fg!(∇f, x)
    ∇f = g!(∇f, x)
    fx = f!(x)
    return fx, ∇f
end
function fgh!(∇f, ∇²f, x)
    ∇²f = h!(∇²f, x)
    fx, ∇f = fg!(∇f, x)
    return fx, ∇f, ∇²f
end

f = ScalarObjective(f, g!, fg!, fgh!, h!, nothing, nothing, nothing)
x0 = ...
lower = ...
upper = ...
prob_bounds = OptimizationProblem(obj=f, bounds=(lower,upper))

res_con = solve(prob_bounds, x0, ActiveBox(), OptimizationOptions())
1 Like

Always have to chime in and suggest considering Ipopt.jl. It has a lot of features you won’t need to take advantage of and has comparatively high boilerplate to some of the other suggestions here. But it has totally un-paralleled performance in my field (maximum likelihood for Gaussian processes). I use it so religiously that I would identify as an Ipopt cultist. It is the only tool I’m willing to use.

I have some convenience functions you could riff from here, and you can see the boilerplate necessary to actually use it after defining those convenience methods here. If you end up having constraints, I can also share some examples for that as well.

If you’re interested in what those args for CreateIpoptProblem mean, here is a commented function call that should give some context.

4 Likes

Right, but if he has the hessian and wants to use optim it’s better to use IPNewton.

1 Like

Yep, Ipopt sounds like a reasonable candidate. Very fast, robust, and can solve instances with tens of thousands of variables on a normal PC when the augmented system is reasonably sparse.
Note: you only have bound constraints, but unfortunately they count as inequality constraints, which is the hardest kind of constraints.

You said the solution may have a lot of zero components, so I would also give an active-set method (for example filterSQP) a shot, starting from (0, \ldots, 0).

jmcastro2109: how large is “very large”?

3 Likes

GitHub - m3g/SPGBox.jl: Spectral Projected Gradient Method for Box-Constrained Minimization is a pure Julia, light alternative to handle very large problems like this. But if you are considering using the true Hessian the problem is not large from this point of view. SPG should behave similarly to LBFGS-B most of the times.

3 Likes

Use GalacticOptim.jl

https://galacticoptim.sciml.ai/dev/

Which method is best? Well,

GalacticOptim just puts one syntax over them all.

using GalacticOptim
rosenbrock(x,p) =  (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
x0 = zeros(2)
p  = [1.0,100.0]

prob = OptimizationProblem(rosenbrock, x0, p, lb = [-1.0,-1.0], ub = [1.0,1.0])

# Solve with Optim
using Optim
sol = solve(prob,NelderMead())

# Solve with BlackBoxOptim
using BlackBoxOptim
sol = solve(prob,BBO_adaptive_de_rand_1_bin_radiuslimited())

So just try a bunch and see. From my tests on such equations, NLopt.LBFGS / Optim.LBFGS with the right choice of autodiff usually fits the bill for this kind of thing, but Optim.KrylovTrustRegion and IPOPT tend to do well too.

If you have analytical Hessians, are you using the sparsity effectively?

4 Likes

ProximalAlgorithms has a few Newton-type methods that support simple constraints (and also other types of non-differentiable penalties) on top of a differentiable objective.

Check out for example this simple constrained quadratic problem, and use PANOC instead of ForwardBackward as algorithm: by default it will use L-BFGS to compute line search directions, so that scales well with the problem dimension.

You can construct nonnegativity constraints using ProximalOperators.IndNonnegative. If the objective is not smooth (ie its gradient is not Lipschitz) you can try replacing PANOC with PANOCplus as algorithm.

Edit: it comes without saying, if you are maximizing f, you should provide -f for minimization :wink:

1 Like

In summary, you have a large (and yet sparse) optimization problem if I understand that correctly. Some one suggested Ipopt.jl.

If you use JuMP.jl you can have access to lots of solvers that are built to solve large sparse optimization problems. KNITRO.jl is my favorite, but you can indeed use Ipopt.jl via JuMP. Either JuMP or the solver (don’t recall which but I think JuMP) will automatically differentiate the problem for you so you don’t have to derive/obtain the Jacobian and Hessian yourself.

2 Likes

Hi!

N is around 25,000. Solving it once with Ipopt works of course, but it takes around 60 seconds, although this can be improved.

The reason I need this to be fast is that I need to repeat this computation many many times under different parameter configurations so performance is key.

Using Convex.jl is an interesting possibility.

The problem is that the function f is of the form f(x) = \sum_{j}(\sum_{i} x_{ij})^{p} - \sum_{i}(\sum_{j}x_{ij})^{k} with p<1 and k>1 and I do not see how to write it in the convex representation. If you see these functions they are like norms, but they are missing the outermost power.

Note: I edited the original post with the function and a MWE

Do you mind sharing an actual minimal running example? (or at least the a complete objective function, gradient, initial point set?)

Posted above

1 Like

For the advocates of Ipopt, @cvanaret @cgeoga I provided a MWE using Ipopt. Its performance, in my current code, is not dismal. It takes about one minute to solve the problem. However, the optimization speed for me is key, and sadly, one minute is too much.

Since I asked for the MWE, I will comment briefly: at least for me, to understand what exactly what is going on, I would require a more modular code, in which I clearly identified 1) the function that computes the objective function; 2) the function that computes the gradient; 3) the function that generates the initial point.

I don’t readily understand why func_grad has a solution of the problem in it (which seems to be a global variable by the way). So, at least for me, it would be easier to help if the structure of the code was something like:

function f(x) 
    ...
end
function gradient!(x,g) 
    ...
end
x0 = initial_point(...)

# now call the optimizer
1 Like