Hi all,
I’m trying to train a neural network that is followed by a differentiable optimization layer in Julia. Conceptually it’s:
Flux NN → Economic Dispatch layer (ED layer) → Loss
The ED layer is implemented as a JuMP optimization problem (continuous Unit Commitment relaxation), and I wrote a custom rrule so that gradients can flow back to the NN outputs.
However, I’m running into an issue where gradients either do not flow at all (all zeros) or the rrule does not seem to be called as I expect. After some debugging, I suspect the problem is related to a variable that is fixed via an equality constraint.
I’d really appreciate help checking:
- Whether this modeling pattern is fundamentally incompatible with DiffOpt’s reverse-mode sensitivity, and
- How I should remodel the problem to get meaningful gradients.
Setup (simplified)
In my ED layer, I have a neural network output u_pred[i,t] (continuous relaxation of on/off decisions). Inside the JuMP model, I introduce a variable U[i,t] and then fix it using an equality constraint to u_pred[i,t]:
# U is declared as a variable and then fixed via equality constraints
@variable(diff_model, U[i=1:layer.Ngen, t=1:layer.Nhour])
@constraint(diff_model, [i=1:layer.Ngen, t=1:layer.Nhour], U[i,t] == u_pred[i,t])
# Later, U is used in operational constraints, e.g., capacity limits:
@constraint(diff_model, Pg[i,t] <= PgM[i] * U[i,t])
I’m using DiffOpt.jl in reverse mode to get sensitivities and then pass those back through a custom rrule so that gradients can flow to the NN parameters.
Observed problem
Conceptually, I want to get the gradient of the optimal objective (or of some function of the optimal solution) with respect to u_pred[i,t]. Since U[i,t] is constrained to equal u_pred[i,t], I was expecting that:
• \frac{\partial \text{obj}}{\partial \text{u_pred}} would be linked to
• \frac{\partial \text{obj}}{\partial U} as given by DiffOpt.ReverseVariablePrimal() (or similar).
But what I actually see is:
• U[i,t] == u_pred[i,t] fixes U via equality constraints.
• When I query:
MOI.get(diff_model, DiffOpt.ReverseVariablePrimal(), U[i,t])
I consistently get 0.0 for all i,t.
So it looks like DiffOpt is not giving any nontrivial gradient for a variable that is fully fixed by equality constraints.
Questions
-
Is it expected that DiffOpt.ReverseVariablePrimal() returns zero for variables that are fully fixed by equality constraints? In other words, from an MOI / DiffOpt perspective, is a variable that’s fixed by U[i,t] == constant “non-differentiable” w.r.t. the constant right-hand side?
-
What is the correct way to model this if I want gradients w.r.t. u_pred?