Comparison between Nonconvex.jl, JuMP and CasADi for large sparse nonlinear optimization

I have an example which implements time-optimal control for Dubins vehicle. I have found that Nonconvex.jl is slower than JuMP. I think it is because Nonconvex.jl does not provide sparsity information to Ipopt. The nonzero entries in Jacobian and Hessian are much higher. Is there any way to get around this? I have added the code samples below for reference.

using Nonconvex
using Plots
Nonconvex.@load Ipopt

N = 40  # Number of knot points
Ns = 3  # Number of state variables
Nu = 1  # Number of Inputs

Nx = 1:N # state 1 indices
Ny = N+1:2N # state 2 indices
Nθ = 2N+1:3N # state 3 indices
Nu = 3N+1:4N # input indices
Nt = 4N+1 # time index

f(x) = x[Nt]
function g(x, i)
    xs = x[Nx]
    ys = x[Ny]
    θs = x[Nθ]
    u = x[Nu]
    dt = x[Nt]
    g1 = xs[i+1] - xs[i] - dt*cos(θs[i])
    g2 = ys[i+1] - ys[i] - dt*sin(θs[i])
    g3 = θs[i+1] - θs[i] - dt*u[i]
    return [g1, g2, g3]

m = Model(f)
lb = -10*ones(Nt)
ub = 10*ones(Nt)
lb[Nu] .= -1.0 
ub[Nu] .= 1.0 
lb[Nt] = 2*π/N
ub[Nt] = 100/N
addvar!(m, lb, ub)
for i=1:N-1
    add_eq_constraint!(m, x -> g(x, i))

function end_point(x)
    xf = x[Nx]
    yf = x[Ny]
    return [xf[N]-0.5, yf[N]-0.5]
add_eq_constraint!(m, x -> end_point(x))

function initial_point(x)
    x0 = x[Nx]
    y0 = x[Ny]
    θ0 = x[Nθ]
    return [x0[1], y0[1]-0, θ0[1]-0]

add_eq_constraint!(m, x -> initial_point(x) )

alg = IpoptAlg()
options = IpoptOptions(first_order=false)
x0 = 0.1*ones(Nt)
result = optimize(m, alg, x0, options = options)

plot(result.minimizer[Nx], result.minimizer[Ny], aspect_ratio = 1)

The corresponding JuMP version:

using JuMP
import Ipopt
using Plots
# using LaTeXStrings

# Create JuMP model, using Ipopt as the solver
user_options = (
    # "mu_strategy" => "monotone",
    # "linear_solver" => "ma27",
dubins = Model(optimizer_with_attributes(Ipopt.Optimizer,user_options...))

# Constants
vm = 1.0
um = 1.0        # Maximum thrust

x_f = 0.5
y_f = 0.5

n = 40    # Time steps
tmin = (2*pi*1)/vm
tmax = 100
# Decision variables

@variables(dubins, begin
    (tmin)/n <= Δt <= (tmax)/n  # Time step
    ## State variables
    x[1:n]            # Velocity
    y[1:n]            # Height
    th[1:n]           # Height
    ## Control variables
    -um ≤ u[1:n] ≤ um    # Thrust

# Objective
@objective(dubins, Min, Δt)

## Initial conditions
fix(x[1], 0; force = true)
fix(y[1], 0; force = true)
fix(th[1], 0; force = true)

## Final conditions
fix(x[n], x_f; force = true)
fix(y[n], y_f; force = true)
# fix(th[n], pi/2; force = true)

## Dynamics

for j in 2:n
    @NLconstraint(dubins, x[j] == x[j - 1] + Δt *0.5* vm*(cos(th[j - 1])+cos(th[j])))
    ## Trapezoidal integration
    @NLconstraint(dubins, y[j] == y[j - 1] + Δt *0.5* vm*(sin(th[j - 1])+sin(th[j])))
    ## Trapezoidal integration
    @NLconstraint(dubins, th[j] == th[j - 1] + 0.5*Δt * (u[j]+u[j-1]))
    ## Trapezoidal integration

# Solve for the control and state

function plot_circle(xc, yc, rc, fig)
	th = collect(0:0.1:2π)
	plot!(fig, xc .+ rc*cos.(th), yc .+ rc*sin.(th), line =:dash, color =:red, label =nothing)
## Display results
println("Min time: ", objective_value(dubins)*n)
fig1 = plot(value.(x), value.(y), 
		label = "Vehicle trajectory",
		framestyle =:box, lw = 2)

Adding corresponding CasADi code for reference. Note that, in CasADi I can use explicit RK4 (or any other method) easily. I could not do that in JuMP. While it can be done in Nonconvex.jl there is the issue with sparsity.

using CasADi
using Plots

N = 40 # number of control intervals
x0 = [0; 0; 0] # Initial state
xf = [0.5; 0.5; pi / 2]

# Constant parameters
v = 1.0

opti = casadi.Opti()

#  Decision variables [height;velocity;mass]
x = opti._variable(3, N)
xs = x[1,:]; ys =x[2,:]; ths = x[3,:]
u = opti._variable(1, N) # Control vector
dt = opti._variable(1, 1)

# Dynamic constraints
dubins_ode(x, u) = casadi.vertcat(cos(x[3]), sin(x[3]), u)

for k = 1:N-1
    # Euler integration
    # opti._subject_to(x[:,k+1] ==[:,k], dt*dubins_ode(x[:,k], u[1,k])) )

    # RK-4 Integration
    k1 = dubins_ode(x[:,k] ,        u[:,k])
    k2 = dubins_ode([:,k] , dt*k1/2), u[:,k])
    k3 = dubins_ode([:,k] , dt*k2/2), u[:,k])
    k4 = dubins_ode([:,k] , dt*k3),   u[:,k])
    opti._subject_to(casadi.minus(x[:,k+1],[:,k], (dt/6)*(,k2),,k4))))) == 0)

#  Boundary conditions
opti._subject_to(casadi.minus(x[:,1], x0) == 0)
opti._subject_to(casadi.minus(x[1:2,end], xf[1:2]) == 0)

# Input constraints
opti._subject_to(u >= -1.0) # Bounded control
opti._subject_to(u <= 1.0) #

# Time constraints
mintime = 0.01/N
maxtime = 10/N
opti._subject_to(dt >= mintime)
opti._subject_to(dt <= maxtime)

#  Objective
function obj(x, u, dt)
    return dt
opti.minimize(obj(x, u, dt)) # minimize fuel consumption

# Initial Guess of solution
opti.set_initial(x, 10.0)
opti.set_initial(u, -1.0)
opti.set_initial(dt, 1.0/N)
# Solve

sol = opti.solve()
x = sol.value(x)
dt = sol.value(dt)

#  Post-processing
t = LinRange(0, (N + 1) * sol.value(dt), N + 1)

plt1 = plot(x[1,:], x[2,:], aspect_ratio = 1)

My experience in using these two libraries is consistent with yours. I think the short answer is, if you want to take advantage of sparsity in Julia NLP frameworks other than JuMP, you would need code your own oracles for the sparse Jacobean and Hessian and provide these functions to the modeling layer. There are some special cases where this is not required, for example have a look at this list, OptimizationFunction · GalacticOptim.jl

You might be interested in following the discussions that are on going in the following threads where I have been exploring some similar questions,

1 Like

The issue with JuMP is that hessian information is disabled for user defined functions with multiple arguments. This significantly impacts flexibility of code that can be written. For instance if I wanted to implement RK4 descritization for the above problem it will be difficult to do it in JuMP.

For large non-linear problems the hessian and sparsity information are very important. I think in the survey of non-linear modelling layers, maybe you can add two columns for these.

Also, you can add Casadi.jl to the list of modelling layers. It is not a registered package and does not support complete Python API. I have raised an issue to register it and add the missing functionality. It will be great to have it working till JuMP reaches maturity for nonlinear optimization.

@mohamed82008 maybe this one is for you

I hear you! It is a known issue and I hope it can be addressed over the next 1-2 years as part of this effort, NumFOCUS signs agreement with LANL to improve nonlinear support in JuMP | JuMP

In my first pass, I wanted to be as inclusive as possible and did not make sparsity a key requirement. But I agree with you it is essential in many applications that I face.

That’s a good tip. From a quick glance it looks a bit too much of a work-in-progress for me but I’ll keep an eye on it. I have heard good things about CasADi in general.

CC @odow


Handling sparsity properly in jacobians and Hessians is not implemented in Nonconvex yet. At best for now you can only use ForwardDiff using GitHub - JuliaNonconvex/NonconvexUtils.jl: Some convenient hacks when using Nonconvex.jl.. It’s not impossible though with the right combination of AbstractDifferentiation (AD) and ModellingToolkit (MTK) to generically detect and exploit sparsity in your functions. It will take some time though because the group of people familiar with autodiff, MTK, AD and optimisation is a small group in the Julia community who are usually busy with other things. I am glad the JuMP team and LANL are starting this new project though and I am sure they will help advance the Julia ecosystem for nonlinear optimisation.

You can do it today.

You just do jacobian_sparsity(f!,du,u) for f!(du,u). There’s an example of doing this all in the DiffEq documentation:

And in the SparseDiffTools example:

With GalacticOptim.jl, it’s this:

using ModelingToolkit, GalacticOptim

@variables x y
@parameters a b
loss = (a - x)^2 + b * (y - x^2)^2
sys = OptimizationSystem(loss,[x,y],[a,b])

u0 = [
p = [
    a => 6.0
    b => 7.0

prob = OptimizationProblem(sys,u0,p,grad=true,hess=true,sparse=true)

I have added corresponding Casadi code. In Casadi, RK4 (or other methods) can be implemented easily.

Thanks for the pointer. I now have everything I need to implement a new AbstractDifferentiation backend that “specialises” on functions. This will unlock sparsity exploitation beyond GalacticOptim and hopefully I can demonstrate an example with Nonconvex.