JuMP.jl vs Optimization.jl

Hello everyone,

Not sure whether anyone else has the same problem, but I find it a bit hard to understand what’s the actual difference between Optimization.jl and JuMP.jl.

To my untrained eye, it seems like they both offer the same selling point, which is, one package to optimize pretty much anything (please correct me if I’m wrong).

But why would I choose one over the other for a specific problem?
Is there a clear way to decide which one would be better for a given application, or is it just a matter of preference?

As an example, the following function:

function minfun(c0::AbstractVector, data::AbstractVector, K::AbstractMatrix, α::Real)

   G = K;
   G[:, (K'*c0 .<= 0) ] .= 0
   M = G * K'
   M2 = α * I + M
   M3 = M2 * c0;

   # Function to be minimzed
   f = 0.5 * c0' * M3 - c0'*data;

   #Gradient vector
   grad = M3 - data
   
   #Hessian matrix
   H = M2

   return f,grad,H

end

can be minimized using MATLAB’s fminunc().
Which package would you use in Julia, and why?

2 Likes

The jump docs are great in clearly answering this question. The entire page here is probably relevant but Should you use JuMP? · JuMP is specifically the answer to your question.

4 Likes

Thank you very much for the link!
I was looking for such a page but got lost in the documentations.

This is a very quick and dirty version of how you’d do this in Optimization.jl

function minfun(c0::AbstractVector, p = [α, data, K])
   α, data, K = p
   G = K;
   G[:, (K'*c0 .<= 0) ] .= 0
   M = G * K'
   M2 = α * I + M
   M3 = M2 * c0;

   # Function to be minimzed
   f = 0.5 * c0' * M3 - c0'*data;
   return f
end

function grad(G, c0,  p = [α, data, K])
    α, data, K = p
    G = K;
   G[:, (K'*c0 .<= 0) ] .= 0
   M = G * K'
   M2 = α * I + M
   M3 = M2 * c0;
   G .= M3 - data
end

function hess(H, c0,  p = [α, data, K])
    α, data, K = p
    G = K;
   G[:, (K'*c0 .<= 0) ] .= 0
   M = G * K'
   H .= α * I + M
end

optf = OptimizationFunction(minfun, grad = grad, hess = hess)
prob = OptimizationProblem(optf, c0, [α, data, K])
sol = solve(prob, BFGS())

Thanks a ton!
Just to clarify, which packages do you use for this? I get an error “BFGS not defined”.

using Optimization, OptimizationOptimJL, ForwardDiff

This should work

Thanks for clarifying.
Unfortunately, I get “retcode: Failure” when I run it.

Could you provide a working version of the script I can run on my end - I’d need values for c0, α, data, K

Thank you very much for the reply, @Vaibhavdixit02.

Here is a link to a onedrive folder containing g and K values, the .jl file attempting to use the method above, as well as a Matlab script which solves the problem.
For c0 I would use ones(size(K,1)) and for alpha I would use 1.

Please let me know if I can provide more information.

Hey, you aren’t doing the gradient in-place so the optimization doesn’t move away from the initial point

function grad(gradient,c, p=[α, data, K])
    α, data, K = p
    G = deepcopy(K)
    G[:, (K'*c.<=0)] .= 0
    M = G * K'
    M2 = α * I + M
    M3 = M2 * c
    gradient .= M3 - data
end

makes it work. With default tolerances NewtonTrustRegion still returns Failure but that’s because of the gradient norm criteria, for Newton you get success. You can change the tolerances of the solver as you want.

I think there are plenty of things you can do to improve the performance of your derivatives if you know the structure of the intermediate matrices you are creating so you might want to look into that

1 Like

Thank you for looking into this!

I wouldn’t ever imagine that changing “=” to “.=” in the gradient would solve the problem.
As for improving performance, I’ll have to do some homework on that!

For comparison, here is how I would write a version of your model in JuMP:

using JuMP
import Ipopt
using DelimitedFiles
using LinearAlgebra

a = 1
K = readdlm("K.csv", ',', header=false)
g = vec(readdlm("g.csv", ',', header=false))

m, n = size(K)
@assert (m,) == size(g)

model = JuMP.Model(Ipopt.Optimizer)

@variable(model, c[1:m])
@variable(model, x[1:n])

@constraint(model, x == K'c)
@objective(model, Min,
    0.5*a*c'c + 0.5*x'x - g'c
)

optimize!(model)

This one for now ignores the modification of the matrix

but I have some ideas on how to handle with this with integer programming if necessary.

1 Like

Thank you for your input, @jd-foster !

Unfortunately, I think the G matrix is crucial here, as I’m trying to exactly reproduce the results of another program.

I would be very interested in your suggestion regarding integer programming, would be great if you could provide some more details.

Thanks. It would be helpful to know why you’re zeroing out the columns based on the negative values. Is there a higher level interpretation?

And just to make sure, are you optimizing with respect to c0 only?

I’m following this paper, which uses this method (Butler-Reeds-Dawson 1981 algorithm) to greatly speed up a large Tikhonov regularization problem.

min ||Kf-g||^2 +||αf||^2
s.t. f \geq 0

I wanted to reproduce this method in Julia, so that I can move away from the non-adaptable matlab GUI we’ve been using in our group so far.
Performance is good to have, since this code is used routinely on large sets of data.

There’s another thread where people kindly provided suggestions for solving this problem. Some of these methods provide effectively the same results as the ones by the Optimization.jl method suggested above in this thread. The fastest I’ve tested so far would be this JuMP implementation:

JuMP/Gurobi method
        model = Model(() -> Gurobi.Optimizer(GRB_ENV))
        set_silent(model)
        @variable(model, f[1:size(K, 2)] >= 0)
        @variable(model, r[1:size(K, 1)])
        @constraint(model, r .== g .- K * f)
        @objective(model, Min, LinearAlgebra.dot(r, r) + α * LinearAlgebra.dot(f, f))
        optimize!(model)
        @assert is_solved_and_feasible(model)
        return value.(f), value.(r)

followed by the optimization method suggested above, modified as follows:

Optimization.jl method
function minfun(c::AbstractVector, p=(α, data, K))
    G = copy(p[3])
    G[:, (p[3]'*c.<=0)] .= 0
    f = 0.5 * c' * ((p[1] * I + (G * p[3]')) * c) - c' * p[2]
end

function grad(gradient, c, p=(α, data, K))
    G = copy(p[3])
    G[:, (p[3]'*c.<=0)] .= 0
    gradient .= (p[1] * I + G * p[3]') * c - p[2]
end

function hess(H, c, p=(α, data, K))
    G = copy(p[3])
    G[:, (p[3]'*c.<=0)] .= 0
    H .= p[1] * I + G * p[3]'
end

optf = OptimizationFunction(minfun, grad=grad, hess=hess)
prob = OptimizationProblem(optf, ones(size(K, 1)), (α, g, K))
c = solve(prob, NewtonTrustRegion(), x_tol=1e-8, maxiters=5000, maxtime=100)
f = vec(max.(0, (K' * c)))

(although I reckon allocations on the latter could be greatly reduced by not using deepcopy for G at each call, potentially making it the fastest).

Well, I’d also like to optimize α at the end of the day.
Currently I’m following the method suggested here, where they use a GCV method for that. They effectively try to minimize a “score” parameter produced by the function below

GCV score function
function gcv_score(α, s, r) # where r is the residuals of the solution
    ñ = length(s)
    c = r ./ α[end]
    σ² = s .^ 2
    m̂ = sum(σ² ./ (σ² .+ α[end]))
    dm̂ = sum(σ² ./ ((σ² .+ α[end]) .^ 2))
    t̂ = sum((svds.s .* c) .^ 2 ./ (σ² .+ α[end]))
    φₙ = (α[end] .^ 2 * c' * c .* ñ) ./ ((ñ - m̂) .^ 2)  # GCV score to be minimized
    αₙ = (α[end] .^ 2 * c' * c * dm̂) / (t̂ * (ñ - m̂))  # α update value, test this one next

    return φₙ , αₙ
end

function gcv_score(α, s, c) # where c is the vector from the Optimization.jl method
    ñ = length(s)
    σ² = s .^ 2
    m̂ = sum(σ² ./ (σ² .+ α[end]))
    dm̂ = sum(σ² ./ ((σ² .+ α[end]) .^ 2))
    t̂ = sum((svds.s .* c) .^ 2 ./ (σ² .+ α[end]))
    φₙ = (α[end] .^ 2 * c' * c .* ñ) ./ ((ñ - m̂) .^ 2)  # GCV score to be minimized
    αₙ = (α[end] .^ 2 * c' * c * dm̂) / (t̂ * (ñ - m̂))  # α update value, test this one next

    return φₙ , αₙ
end

This fuction returns the objective value which needs to be minimized, as well as the next α to try. The plan is to just evaluate this starting with a large α, and evaluate the score repeatedly until a minimum value is found (if the new α is larger than the previous one, you stop).
Frankly, I’ve got no idea how this formula is derived and what the sensitivity on the optimal α would be in this case, but it’s pretty fast and converges within 4-7 iterations in most cases.

I’m aware you’ve previously recommended the following.

and I did try to play around with it, using OptimizationBBO, but I haven’t found a method yet which converges as fast as the “α update” method above. Still trying to work on that though, and any suggestions would be welcome.

The OneDrive link above has been updated to include the “s” vector required by the gcv function, in case someone wants to play around with it. You will also find inside some .pdf files of the papers cited above (if someone does not have access).

The simple answer is that JuMP is a declarative modeling system and Optimization.jl is an interface which takes imperative Julia code. Declarative modeling systems are able to achieve better performance due to having the problem definition at a higher level, and thus they can typically do automated symbolic manipulations to your model and ensure the code is generated in a high performance manner. But declarative modeling systems can be restrictive, and sometimes being able to define a loss function via arbitrary code is useful, hence imperative interfaces which directly take and use the code given.

I should mention ModelingToolkit.jl also in this space, it’s a declarative modeling system which targets Optimization.jl’s interfaces. It has a lot more symbolic manipulation than JuMP but currently is lacking some optimizations in the array-based code generation for larger models.

4 Likes

@ChrisRackauckas, Thank you for the comprehensive answer!