User defined gradient in JuMP with Ipopt: gradient with user data

Hello, I have been trying to register a gradient in JuMP that includes user data. I think the issue is with how I have formulated the gradient and fed it into JuMP, but I am not sure how to use the @expression or @parameters macros in a way that incorporates the data.

Perhaps I have missed something in the documentation or in another thread? Any help would be appreciated. Reproducible code below. Thank you in advance!

# simplified working model
using Random

function obj_f(power, shifter, KFshifter, KRshifter, p_F, params, Inputvec::T...) where {T<:Real}
    mid = length(Inputvec) Ă· 2
    Dvec = Inputvec[1:mid]
    Yvec = Inputvec[1+mid:end]
    Dsec = repeat(Dvec, 1, params.I)
    power2 = 1 / params.alpha1
    value1 = -Dsec .^ power .* shifter
    value2 = p_F .* ((Yvec .- KRshifter) ./ KFshifter .^ params.alpha2) .^ power2
    value1 = sum(value1, dims=2)
    value = sum(value1) + sum(value2)

    return value
end

function grad_f(power, shifter, KFshifter, KRshifter, p_F, params, Inputvec::T...) where {T<:Real}
    mid = length(Inputvec) Ă· 2
    Dvec = Inputvec[1:mid]
    Yvec = Inputvec[1+mid:end]
    Dsec = repeat(Dvec, 1, params.I)
    power2 = 1 / params.alpha1

    grad1f = -sum((power).* Dsec .^ (power - 1) .* shifter, dims=2)
    grad2f = power2 .* p_F .* (Yvec - KRshifter) .^ (power2-1) .* (1 ./ KFshifter .^ params.alpha2).^ (power2)
    gradf = sparse([grad1f, grad2f])
    return gradf
end

function f_hess(lambda, power, shifter, KFshifter, KRshifter, p_F, mat, params, Inputvec::T...) where {T<:Real}
    Dvec = Inputvec[1:endĂ·2]  # Ă· is integer division
    Yvec = Inputvec[endĂ·2+1:end]
    J = length(Dvec)

    Dsec = repeat(Dvec, 1, params.I)
    power2 = (1 / params.alpha1)

    piece1 = -sum(((power .- 1) .* power) .* Dsec .^ (power .- 2) .* shifter, dims=2)
    piece1 = Diagonal(piece1)

    piece2 = (power2 - 1) .* power2 .* p_F .* (Yvec .- KRshifter) .^ (power2 - 2) .* (1 ./ KFshifter .^ params.alpha2) .^ power2
    piece2 = Diagonal(piece2)

    f2 = [piece1 zeros(J, J); zeros(J, J) piece2]
    f2 = sparse(f2)

    xmat = params.Rweight * 2 * mat
    zeros1block = zeros(1, J - 1)
    zeros2block = zeros(J - 1, 1)

    hessc = [-2 zeros1block' 2 zeros1block';
             zeros2block -xmat zeros2block xmat;
             2 zeros1block' -2 zeros1block';
             zeros2block xmat zeros2block -xmat]

    hessc = sparse(hessc)

    hess = sparse(f2 + lambda[:eqnonlin][1] * hessc)
    return hess  
end

function mycon(Inputvec, mat, params)
    mid = length(Inputvec) Ă· 2
    Dvec = Inputvec[1:mid]
    Yvec = Inputvec[1+mid:end]
    Pvec = Yvec - Dvec
    ceq = sum(Pvec) - params.Rweight * (Pvec[2:end]' * mat * Pvec[2:end]) - Pvec[1]^2
    return ceq
end

# arbitrarily defined variables
Random.seed!(123)
l_guess = 26
YFmax = 1e-5 .+ (1e-4 - 1e-5) .* rand(13) 
KRshifter = rand(0:0.0001:5.5, 13)

LB = [zeros(13); KRshifter]
UB = [1e3 * ones(13); YFmax .+ KRshifter .+ 1]
guess = [KRshifter; KRshifter .+ 0.001]
struct rParams
    B
end
regionParams = rParams(rand(0:0.009:0.1, 12, 12))
struct Params
    I
    alpha1
    alpha2
    Rweight
end
params = Params(10, 0.3, 0.7, 1.5)
power = rand(0:0.01:0.55, 13, 10)
shifter = rand(0:0.0000001:0.1, 13, 10)
KFshifter = rand(0:0.00001:0.0003, 13)
p_F_LR = 1


# initialize array for x
using JuMP
using Ipopt
x = Vector{AffExpr}(undef, l_guess)
model = Model(Ipopt.Optimizer)
@variable(model, LB[i] <= x[i=1:l_guess] <= UB[i], start=guess[i])
@constraint(model, c1, mycon(x, regionParams.B, params) == 0)

# The solve works without the user defined gradient and hessian
@objective(model, Min, obj_f(power, shifter, KFshifter, KRshifter, p_F_LR, params, x))
optimize!(model)



# try to add gradient
x = Vector{AffExpr}(undef, l_guess)
m_g = Model(Ipopt.Optimizer)
@variable(m_g, LB[i] <= x[i=1:l_guess] <= UB[i], start=guess[i])
@constraint(m_g, c1, mycon(x, regionParams.B, params) == 0)

@operator(m_g, op_fg, 1, obj_f, grad_f)
@objective(m_g, Min, op_fg(power, shifter, KFshifter, KRshifter, p_F_LR, params, x))
optimize!(m_g)

This is followed by the error:

ERROR: Unable to register the function :op_fg.

Common reasons for this include:

 * The function takes `f(x::Vector)` as input, instead of the splatted
   `f(x...)`.
 * The function assumes `Float64` will be passed as input, it must work for any
   generic `Real` type.
 * The function allocates temporary storage using `zeros(3)` or similar. This
   defaults to `Float64`, so use `zeros(T, 3)` instead.

As examples, instead of:
```julia
my_function(x::Vector) = sum(x.^2)
```
use:
```julia
my_function(x::T...) where {T<:Real} = sum(x[i]^2 for i in 1:length(x))
```

Instead of:
```julia
function my_function(x::Float64...)
    y = zeros(length(x))
    for i in 1:length(x)
        y[i] = x[i]^2
    end
    return sum(y)
end
```
use:
```julia
function my_function(x::T...) where {T<:Real}
    y = zeros(T, length(x))
    for i in 1:length(x)
        y[i] = x[i]^2
    end
    return sum(y)
end
```

Review the stacktrace below for more information, but it can often be hard to
understand why and where your function is failing. You can also debug this
outside of JuMP as follows:
```julia
import ForwardDiff

# If the input dimension is 1
x = 1.0
my_function(a) = a^2
ForwardDiff.derivative(my_function, x)

# If the input dimension is more than 1
x = [1.0, 2.0]
my_function(a, b) = a^2 + b^2
ForwardDiff.gradient(x -> my_function(x...), x)
```
```
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] _validate_register_assumptions(f::typeof(grad_f), name::Symbol, dimension::Int64)
    @ MathOptInterface.Nonlinear C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Nonlinear\operators.jl:327
  [3] MathOptInterface.Nonlinear._UnivariateOperator(op::Symbol, f::Function, f′::Function)
    @ MathOptInterface.Nonlinear C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Nonlinear\operators.jl:347
  [4] register_operator(::MathOptInterface.Nonlinear.OperatorRegistry, ::Symbol, ::Int64, ::Function, ::Vararg{Function})
    @ MathOptInterface.Nonlinear C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Nonlinear\operators.jl:399
  [5] register_operator
    @ C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Nonlinear\model.jl:237 [inlined]
  [6] set(model::Ipopt.Optimizer, attr::MathOptInterface.UserDefinedFunction, args::Tuple{typeof(obj_f), typeof(grad_f)})
    @ Ipopt C:\Users\vrg2121\.julia\packages\Ipopt\bqp63\src\MOI_wrapper.jl:490
  [7] set(b::MathOptInterface.Bridges.LazyBridgeOptimizer{…}, attr::MathOptInterface.UserDefinedFunction, value::Tuple{…})
    @ MathOptInterface.Bridges C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Bridges\bridge_optimizer.jl:955
  [8] _pass_attribute(dest::MathOptInterface.Bridges.LazyBridgeOptimizer{…}, src::MathOptInterface.Utilities.UniversalFallback{…}, index_map::MathOptInterface.Utilities.IndexMap, attr::MathOptInterface.UserDefinedFunction)
    @ MathOptInterface.Utilities C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Utilities\copy.jl:51
  [9] pass_attributes(dest::MathOptInterface.Bridges.LazyBridgeOptimizer{…}, src::MathOptInterface.Utilities.UniversalFallback{…}, index_map::MathOptInterface.Utilities.IndexMap)
    @ MathOptInterface.Utilities C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Utilities\copy.jl:38
 [10] default_copy_to(dest::MathOptInterface.Bridges.LazyBridgeOptimizer{…}, src::MathOptInterface.Utilities.UniversalFallback{…})
    @ MathOptInterface.Utilities C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Utilities\copy.jl:503
 [11] copy_to
    @ C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Bridges\bridge_optimizer.jl:455 [inlined]
 [12] optimize!
    @ C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\MathOptInterface.jl:84 [inlined]
 [13] optimize!(m::MathOptInterface.Utilities.CachingOptimizer{…})
    @ MathOptInterface.Utilities C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Utilities\cachingoptimizer.jl:316
 [14] optimize!(model::Model; ignore_optimize_hook::Bool, _differentiation_backend::MathOptInterface.Nonlinear.SparseReverseMode, kwargs::@Kwargs{})
    @ JuMP C:\Users\vrg2121\.julia\packages\JuMP\7rBNn\src\optimizer_interface.jl:595
 [15] optimize!(model::Model)
    @ JuMP C:\Users\vrg2121\.julia\packages\JuMP\7rBNn\src\optimizer_interface.jl:546
 [16] top-level scope
    @ REPL[80]:1

caused by: MethodError: no method matching grad_f(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(grad_f), Float64}, Float64, 1})

Closest candidates are:
  grad_f(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any)
   @ Main REPL[44]:1

Stacktrace:
  [1] derivative(f::typeof(grad_f), x::Float64)
    @ ForwardDiff C:\Users\vrg2121\.julia\packages\ForwardDiff\PcZ48\src\derivative.jl:14
  [2] _validate_register_assumptions(f::typeof(grad_f), name::Symbol, dimension::Int64)
    @ MathOptInterface.Nonlinear C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Nonlinear\operators.jl:321
  [3] MathOptInterface.Nonlinear._UnivariateOperator(op::Symbol, f::Function, f′::Function)
    @ MathOptInterface.Nonlinear C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Nonlinear\operators.jl:347
  [4] register_operator(::MathOptInterface.Nonlinear.OperatorRegistry, ::Symbol, ::Int64, ::Function, ::Vararg{Function})
    @ MathOptInterface.Nonlinear C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Nonlinear\operators.jl:399
  [5] register_operator
    @ C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Nonlinear\model.jl:237 [inlined]
  [6] set(model::Ipopt.Optimizer, attr::MathOptInterface.UserDefinedFunction, args::Tuple{typeof(obj_f), typeof(grad_f)})
    @ Ipopt C:\Users\vrg2121\.julia\packages\Ipopt\bqp63\src\MOI_wrapper.jl:490
  [7] set(b::MathOptInterface.Bridges.LazyBridgeOptimizer{…}, attr::MathOptInterface.UserDefinedFunction, value::Tuple{…})
    @ MathOptInterface.Bridges C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Bridges\bridge_optimizer.jl:955
  [8] _pass_attribute(dest::MathOptInterface.Bridges.LazyBridgeOptimizer{…}, src::MathOptInterface.Utilities.UniversalFallback{…}, index_map::MathOptInterface.Utilities.IndexMap, attr::MathOptInterface.UserDefinedFunction)
    @ MathOptInterface.Utilities C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Utilities\copy.jl:51
  [9] pass_attributes(dest::MathOptInterface.Bridges.LazyBridgeOptimizer{…}, src::MathOptInterface.Utilities.UniversalFallback{…}, index_map::MathOptInterface.Utilities.IndexMap)
    @ MathOptInterface.Utilities C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Utilities\copy.jl:38
 [10] default_copy_to(dest::MathOptInterface.Bridges.LazyBridgeOptimizer{…}, src::MathOptInterface.Utilities.UniversalFallback{…})
    @ MathOptInterface.Utilities C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Utilities\copy.jl:503
 [11] copy_to
    @ C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Bridges\bridge_optimizer.jl:455 [inlined]
 [12] optimize!
    @ C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\MathOptInterface.jl:84 [inlined]
 [13] optimize!(m::MathOptInterface.Utilities.CachingOptimizer{…})
    @ MathOptInterface.Utilities C:\Users\vrg2121\.julia\packages\MathOptInterface\2CULs\src\Utilities\cachingoptimizer.jl:316
 [14] optimize!(model::Model; ignore_optimize_hook::Bool, _differentiation_backend::MathOptInterface.Nonlinear.SparseReverseMode, kwargs::@Kwargs{})
    @ JuMP C:\Users\vrg2121\.julia\packages\JuMP\7rBNn\src\optimizer_interface.jl:595
 [15] optimize!(model::Model)
    @ JuMP C:\Users\vrg2121\.julia\packages\JuMP\7rBNn\src\optimizer_interface.jl:546
 [16] top-level scope
    @ REPL[80]:1
Some type information was truncated. Use `show(err)` to see complete types.

Hi @vict_riag, you have a few things that need fixing.

See this part of the documentation:

  1. Your grad_f must take a vector as the first argument that is filled in-place with the gradient.
  2. @operator(m_g, op_fg, 1, obj_f, grad_f) declares that op_fg takes 1 input argument, but you have provided 7.
  3. the inputs to op_fg must be scalar decision variables. You cannot pass a vector like x, and you shouldn’t pass the various parameters and constants.

I didn’t run your code so there might be a typo, etc, but you need something like:

m_g = Model(Ipopt.Optimizer)
@variable(m_g, LB[i] <= x[i=1:l_guess] <= UB[i], start=guess[i])
function new_obj_f(x...)
    return obj_f(power, shifter, KFshifter, KRshifter, p_F_LR, params, collect(x))
end
function new_grad_f(g, x...)
    g .= grad_f(power, shifter, KFshifter, KRshifter, p_F_LR, params, collect(x))
    return
end
@operator(m_g, op_fg, l_guess, new_obj_f, new_grad_f)
@objective(m_g, Min, op_fg(x...))
optimize!(m_g)

In general, if function tracing works, then you should use that.

1 Like

Thank you for the quick reply! There were some typos in the objective and gradient functions and I forgot to add the constraint. The corrected code is below:

using SparseArrays
function obj_f(power, shifter, KFshifter, KRshifter, p_F, params, Inputvec)
    mid = length(Inputvec) Ă· 2
    Dvec = Inputvec[1:mid]
    Yvec = Inputvec[1+mid:end]
    Dsec = repeat(Dvec, 1, params.I)
    power2 = 1 / params.alpha1
    value1 = -Dsec .^ power .* shifter
    value2 = p_F .* ((Yvec .- KRshifter) ./ KFshifter .^ params.alpha2) .^ power2
    value1 = sum(value1, dims=2)
    value = sum(value1) + sum(value2)

    return value
end

function grad_f(power, shifter, KFshifter, KRshifter, p_F, params, Inputvec)
    mid = length(Inputvec) Ă· 2
    Dvec = Inputvec[1:mid]
    Yvec = Inputvec[1+mid:end]
    Dsec = repeat(Dvec, 1, params.I)
    power2 = 1 / params.alpha1

    grad1f = -sum((power).* Dsec .^ (power .- 1) .* shifter, dims=2)
    grad2f = power2 .* p_F .* (Yvec .- KRshifter) .^ (power2-1) .* (1 ./ KFshifter .^ params.alpha2).^ (power2)
    gradf = sparse([grad1f, grad2f])
    return gradf
end


The correct optimization problem is as follows:

x = Vector{AffExpr}(undef, l_guess)
m_g = Model(Ipopt.Optimizer)
@variable(m_g, LB[i] <= x[i=1:l_guess] <= UB[i], start=guess[i])
@constraint(m_g, c1, mycon(x, regionParams.B[kk], params) == 0)

function new_obj_f(x...)
    return obj_f(power, shifter, KFshifter, KRshifter, p_F_LR, params, collect(x))
end
function new_grad_f(g, x...)
    g = grad_f(power, shifter, KFshifter, KRshifter, p_F_LR, params, collect(x))
    return
end
function new_hess_f(h, x...)
    h = f_hess(lambda, power, shifter, KFshifter, KRshifter, p_F_LR, regionParams.B, params, x)
@operator(m_g, op_fg, l_guess, new_obj_f, new_grad_f)
@objective(m_g, Min, op_fg(x...))
optimize!(m_g)

I think my main issue was understanding what the documentation meant by “function tracing” and structuring functions compatible with JuMP. I am going to play with the hessian to see if that can be incorporated.

Function tracing means using Julia’s operator overloading to return an analytic expression of your function. Whether this works depends on the function. There aren’t many good rules other than “try and see if it works.”

In your case, obj_f uses only the elementary functions + *, -, / and ^, so things should work.

Function tracing is preferable, because JuMP can often create more efficient derivatives than you can write by hand (although perhaps not in this case), and it avoids the risk of you making a mistake or typo in the grad_f function that leads to an incorrect result.