DiffOpt / JuMP: zero gradient for variable fixed by equality constraint

Hi all,

I’m trying to train a neural network that is followed by a differentiable optimization layer in Julia. Conceptually it’s:

Flux NN → Economic Dispatch layer (ED layer) → Loss

The ED layer is implemented as a JuMP optimization problem (continuous Unit Commitment relaxation), and I wrote a custom rrule so that gradients can flow back to the NN outputs.

However, I’m running into an issue where gradients either do not flow at all (all zeros) or the rrule does not seem to be called as I expect. After some debugging, I suspect the problem is related to a variable that is fixed via an equality constraint.

I’d really appreciate help checking:

  1. Whether this modeling pattern is fundamentally incompatible with DiffOpt’s reverse-mode sensitivity, and
  2. How I should remodel the problem to get meaningful gradients.

Setup (simplified)

In my ED layer, I have a neural network output u_pred[i,t] (continuous relaxation of on/off decisions). Inside the JuMP model, I introduce a variable U[i,t] and then fix it using an equality constraint to u_pred[i,t]:

# U is declared as a variable and then fixed via equality constraints

@variable(diff_model, U[i=1:layer.Ngen, t=1:layer.Nhour])

@constraint(diff_model, [i=1:layer.Ngen, t=1:layer.Nhour], U[i,t] == u_pred[i,t])

# Later, U is used in operational constraints, e.g., capacity limits:
@constraint(diff_model, Pg[i,t] <= PgM[i] * U[i,t])

I’m using DiffOpt.jl in reverse mode to get sensitivities and then pass those back through a custom rrule so that gradients can flow to the NN parameters.


Observed problem

Conceptually, I want to get the gradient of the optimal objective (or of some function of the optimal solution) with respect to u_pred[i,t]. Since U[i,t] is constrained to equal u_pred[i,t], I was expecting that:
\frac{\partial \text{obj}}{\partial \text{u_pred}} would be linked to
\frac{\partial \text{obj}}{\partial U} as given by DiffOpt.ReverseVariablePrimal() (or similar).

But what I actually see is:
U[i,t] == u_pred[i,t] fixes U via equality constraints.
• When I query:

MOI.get(diff_model, DiffOpt.ReverseVariablePrimal(), U[i,t])

I consistently get 0.0 for all i,t.

So it looks like DiffOpt is not giving any nontrivial gradient for a variable that is fully fixed by equality constraints.


Questions

  1. Is it expected that DiffOpt.ReverseVariablePrimal() returns zero for variables that are fully fixed by equality constraints? In other words, from an MOI / DiffOpt perspective, is a variable that’s fixed by U[i,t] == constant “non-differentiable” w.r.t. the constant right-hand side?

  2. What is the correct way to model this if I want gradients w.r.t. u_pred?


Thanks for the help! :grinning_face:

@mbesancon ?

I have no idea what you want to do. But since I have some limited knowledge about unit commitment and JuMP.

You probably want to use JuMP.fix instead (and possibly JuMP.FixRef if you want to get its dual subsequently).

1 Like

Hi @yeomoon, welcome to the forum :smile:

Can you provide a reproducible example? I can’t tell exactly what the problem is from your code snippet. This is really a question for @joaquimg; I can ask him in a couple of hours when he comes to my house for lunch :laughing:

If you want to get \frac{\partial obj}{\partial U} you can do:

@constraint(diff_model, con_U[i=1:layer.Ngen, t=1:layer.Nhour], U[i,t] == u_pred[i,t])
optimize!(diff_model)
d_obj_d_U = dual.(con_U)
2 Likes

Hi @yeomoon ,

It is indeed hard to give a proper answer with no MWE.

If you just need the derivative of the objective function wrt a rhs you can just grab the dual as @odow suggested.

Note that its also possible for some derivatives to be zero very frequently if the problem is fully linear. Some quadratic regularization might help, but I’d say thats more advanced stuff.

3 Likes

Hi @odow, @joaquimg,

Thanks a lot for your replies and pointers! :smile:
(Unfortunately I can only mention two people per post, but I really appreciate all of your answers.)

They were very helpful for me to clarify what I actually want.

I prepared the following MWE to make my question more concrete.
Here, U is fixed via an equality constraint, and it appears in the
capacity constraint of Pg.

using JuMP
using DiffOpt
using Gurobi
using MathOptInterface
const MOI = MathOptInterface

Ngen  = 1
Nhour = 1

u_pred = [0.5]     # "NN output"
PgM   = [10.0]

model = Model(() -> DiffOpt.diff_optimizer(Gurobi.Optimizer))
set_silent(model)

@variable(model, Pg[1:Ngen, 1:Nhour] >= 0)
@variable(model, U[1:Ngen, 1:Nhour])

@constraint(model, [i=1:Ngen, t=1:Nhour], U[i,t] == u_pred[1])
@constraint(model, [i=1:Ngen, t=1:Nhour], Pg[i,t] <= PgM[i] * U[i,t])

@objective(model, Min, (Pg[1,1] - 10.0)^2)

println("=== Forward solve ===")
optimize!(model)
println("Pg = ", value.(Pg))
println("U  = ", value.(U))

DiffOpt.empty_input_sensitivities!(model)
MOI.set(model, DiffOpt.ReverseVariablePrimal(), Pg[1,1], 1.0)

DiffOpt.reverse_differentiate!(model)

grad_Pg = MOI.get(model, DiffOpt.ReverseVariablePrimal(), Pg[1,1])
grad_U  = MOI.get(model, DiffOpt.ReverseVariablePrimal(), U[1,1])

println("=== Reverse-mode sensitivities ===")
println("grad_Pg = ", grad_Pg)
println("grad_U  = ", grad_U)

For this toy problem, the forward solution is

  • unconstrained minimizer of (Pg - 10)^2 is Pg = 10,
  • but the second constraint is Pg ≤ PgM * U = 10 * 0.5 = 5,
  • so the optimum is Pg^* = 5 with a binding constraint.

If I treat U as a parameter, the optimal value as a function of U is

f(U) = (PgM \cdot U - 10)^2 = (10U - 10)^2,

so the derivative is

f’(U) = 2(10U - 10)\cdot 10.

At the current point (U = 0.5),

f’(0.5) = 2(5 - 10)\cdot 10 = -100.

Equivalently, using the KKT conditions and a dual variable (\lambda) for

g(Pg,U) = Pg - PgM\cdot U \le 0,

we have at the optimum

  • (Pg^* = 5),
  • (\lambda = 10) from stationarity (2(Pg^* - 10) + \lambda = 0),
  • and by the envelope theorem
\frac{\partial f}{\partial U} = \lambda \frac{\partial g}{\partial U} = \lambda (-PgM) = 10 \cdot (-10) = -100.

However, when I run the code above, I get:

=== Forward solve ===
Pg = [5.0;;]
U  = [0.5;;]
=== Reverse-mode sensitivities ===
grad_Pg = 1.0
grad_U  = nothing

So it seems that I may be misunderstanding the intended usage of DiffOpt here. In this kind of setup, where U is fixed via an equality constraint to an “external” value u_pred, what is the recommended way to obtain \partial f / \partial U or \partial f / \partial u_{\text{pred}}?

Again, thank you very much for your help!

Although I have no idea about DiffOpt.jl, I guess you probably don’t need it, JuMP.dual should be sufficient.

I notice you want this value

Using JuMP.dual should return what you want.


I have no knowledge about this, but I guess you possibly won’t need this. It seems to be some misconception.


Generally speaking, I think you need a subgradient at a trial solution. The function should be defined via a convex optimization problem. Differentiability should be an added property on top of that.

1 Like

No, U is a copy variable within the lower-level program. I never encounter a need to acquire a derivative w.r.t. it. This is probably a misconception.

1 Like

Wanting to use DiffOpt.jl is not a misconception. There’s an entire field of research dedicated to obtaining derivatives of problem solutions with respect to cost or constraint parameters, and DiffOpt.jl is a prominent package allowing people to do just that.
However, Walter is right that U is not an input to the problem, it’s a free variable: the real input is the parameter u_pred (along with other parameters), and the outputs are the optimal values U^*, P_g^*.

3 Likes

What I will say though is that the documentation of DiffOpt.jl is a bit confusing as to the nature of the objects you obtain. Here, I think you might be misusing the library: what you seem to want is the gradient of the objective value with respect to the parameter u_pred? If that’s the case, indeed you only need the dual. DiffOpt.jl is useful when you need to differentiate the solution vector with respect to problem parameters.

1 Like

Hi @yeomoon,

You can achieve what you are after without using DiffOpt. Here’s an example:

julia> using JuMP

julia> using Gurobi

julia> function main()
           Ngen  = 1
           Nhour = 1
           u_pred = [0.5]     # "NN output"
           PgM   = [10.0]
           model = Model(Gurobi.Optimizer)
           set_silent(model)
           @variable(model, Pg[1:Ngen, 1:Nhour] >= 0)
           @variable(model, U[1:Ngen, 1:Nhour])
           @constraint(model, c_fix[i=1:Ngen, t=1:Nhour], U[i,t] == u_pred[1])
           @constraint(model, [i=1:Ngen, t=1:Nhour], Pg[i,t] <= PgM[i] * U[i,t])
           @objective(model, Min, (Pg[1,1] - 10.0)^2)
           optimize!(model)
           return dual.(c_fix)
       end
main (generic function with 1 method)

julia> function main_fix()
           Ngen  = 1
           Nhour = 1
           u_pred = [0.5]     # "NN output"
           PgM   = [10.0]
           model = Model(Gurobi.Optimizer)
           set_silent(model)
           @variable(model, Pg[1:Ngen, 1:Nhour] >= 0)
           @variable(model, U[1:Ngen, 1:Nhour])
           for i in 1:Ngen, t in 1:Nhour
               fix(U[i,t], u_pred[1])
           end
           @constraint(model, [i=1:Ngen, t=1:Nhour], Pg[i,t] <= PgM[i] * U[i,t])
           @objective(model, Min, (Pg[1,1] - 10.0)^2)
           optimize!(model)
           return reduced_cost.(U)
       end
main_fix (generic function with 1 method)

julia> main()
Set parameter WLSAccessID
Set parameter WLSSecret
Set parameter LicenseID to value 722777
WLS license 722777 - registered to JuMP Development
1×1 Matrix{Float64}:
 -100.0

julia> main_fix()
Set parameter WLSAccessID
Set parameter WLSSecret
Set parameter LicenseID to value 722777
WLS license 722777 - registered to JuMP Development
1×1 Matrix{Float64}:
 -100.0

You can either fix the variable to a value and then use reduced_cost, or you can explicitly query the dual of the fixing constraint.

3 Likes

Certainly, the solution is what @odow posted.

I will add some extra information for completeness and future reference, in case someone reads this later.

Also, as @gdalle highlighted, DiffOpt is useful for computing derivatives of the (primal) solution of an optimization problem with respect to the problem’s coefficients or parameters.

Forward mode in DiffOpt, as in most AD packages, does not mean just solving the problem. It means computing derivatives in forward mode. In the case of optimization problems this is:

  1. select a direction in which the coefficients (or parameters) will move
  2. set it with DiffOpt.set_forward_parameter or MOI.set(model, DiffOpt.ForwardConstraintFunction(), index, func)
  3. call DiffOpt.forward_differentiate!(model)
  4. query results with DiffOpt.get_forward_variable(model, x) or MOI.get(model, DiffOpt.ForwardVariablePrimal(), x)

Reverse mode is analogous:

  1. select the direction in which the solution is moving
  2. set it with DiffOpt.set_reverse_variable(model, variable, sensib) or MOI.set(model, DiffOpt.ReverseVariablePrimal(), variable, sensib)
  3. call DiffOpt.reverse_differentiate!(model)
  4. query results with DiffOpt.get_reverse_parameter(model, p) or MOI.get(model, DiffOpt.ReverseConstraintFunction(), index)

Note that MOI.get(model, DiffOpt.ReverseVariablePrimal(), variable) is just a method to get back exactly what you set in MOI.set(model, DiffOpt.ReverseVariablePrimal(), variable, sensib) no computation is done. If you did not set it you will get nothing.

4 Likes

Hi all, :grinning_face:

Thank you so much for all your help, and sorry for the late reply — I was busy refactoring my code and checking everything carefully.

The good news is: the issue is now resolved, and gradients are finally flowing through my layer as expected.

In hindsight, the main mistakes on my side were:

  1. I was modeling U as a variable with U == u_pred and then trying to read sensitivities via ReverseVariablePrimal(U), which (as you explained) is not how DiffOpt exposes parameter sensitivities.
  2. I was effectively asking DiffOpt for the wrong thing: instead of sensitivities w.r.t. the parameter u_pred in the constraint, I was trying to treat U as if it were that parameter.

Following your suggestions, I changed the modeling to treat u_pred as a true parameter inside the capacity constraint, and then used DiffOpt.ReverseConstraintFunction on that constraint to backpropagate through the layer.

Here is a minimal working example that now behaves exactly as I expect:

using JuMP
using DiffOpt
using Gurobi
using MathOptInterface
const MOI = MathOptInterface

Ngen  = 1
Nhour = 1

u_pred = 0.5     # "NN output"
PgM   = [10.0]

model = Model(() -> DiffOpt.diff_optimizer(Gurobi.Optimizer))
set_silent(model)

@variable(model, Pg[1:Ngen, 1:Nhour] >= 0)
@constraint(model, con_cap[i=1:Ngen, t=1:Nhour], Pg[i,t] <= PgM[i] * u_pred)
@objective(model, Min, (Pg[1,1] - 10.0)^2)

optimize!(model)

DiffOpt.empty_input_sensitivities!(model)
MOI.set(model, DiffOpt.ReverseVariablePrimal(), Pg[1,1], 1.0)
DiffOpt.reverse_differentiate!(model)

sens_fun = MOI.get(model, DiffOpt.ReverseConstraintFunction(), con_cap[1,1])
println("ReverseConstraintFunction = ", sens_fun)
ReverseConstraintFunction = -5 Pg[1,1] - 1

Special thanks to @odow and @joaquimg for the detailed explanations!

3 Likes