DiffOpt - differentiating through a QP/different from manual

I want to differentiate through a QP following the tutorial from DiffOpt. The QP in question:
image

I would like to differentiate wrt to h, q, and G. I do that in three ways:

  1. Using the paper cited on DiffOpt website: dwh, dwq, dwG
  2. Using DiffOpt: dxh, dxq, dxG
  3. Using a paper on QP sensitivity: dxih,dxiq,dxiG

The first two are identical regardless of the method dwh=dxh=dxih, dwq=dxq=dxiq. There are differences in dwG, dxG, dxiG, so that dwG=dxiG, but dxiG \neq dxG. Given that the result from DiffOpt is different from the other two, how should I compute dxG?

Results:

I am also happy to admit that I made a mistake in matrix computations using the methods from the papers.

Code
using JuMP
import DiffOpt
import Ipopt

n = 2 # variable dimension
m = 1; # no of inequality constraints

Q = [4.0 1.0; 1.0 2.0]
q = [1.0; 1.0]
G = [1.0 1.0]
h = [-1.0]   # initial values set
model = Model(() -> DiffOpt.diff_optimizer(Ipopt.Optimizer))
set_silent(model)
@variable(model, x[1:2])
@constraint(model, cons[j in 1:1], sum(G[j, i] * x[i] for i in 1:2) <= h[j]);

@objective(
    model,
    Min,
    1 / 2 * sum(Q[j, i] * x[i] * x[j] for i in 1:2, j in 1:2) +
    sum(q[i] * x[i] for i in 1:2)
)
optimize!(model)

result = value.(x)
###########From Amos and Kolter (2017)
q = q
G = G
h = h
Q = Q

LHM = [
    Q transpose(G)
    Diagonal(dual.(cons))*G Diagonal(G*value.(x)-h) 
]
LHMinv = inv(LHM)
k1 = length(LHM[:,1])
k2 = length(Diagonal(dual.(cons))[1,:])
RHMb = [zeros(k1-k2,k2)
Diagonal(dual.(cons))] #Matrix(1.0I, size(h)[1])  
RHMc = [-Matrix(1.0I, 2,2)
zeros(k2,2)]

RHMc = [-Matrix(1.0I, 2,2)
zeros(k2,2)]

dallb = LHMinv*RHMb    ####both w and λ
dallc = LHMinv*RHMc    ####both w and λ

dwh = dallb[1:2,:] ####picking only w
dwq = dallc[1:2,:] ####picking only w

dAm = zeros(size(G))
dwA = []
for i = 1:2
        dAm = zeros(size(G))
        dAm[i] = 1.0 ######Do for every element of G
        RHMA = [
            -transpose(dAm).* dual.(cons)
            -Diagonal(dual.(cons)).* dAm* value.(x)
        ]
        dallAtemp = LHMinv * RHMA
        dwAtemp=dallAtemp[1:2,1:end] ####picking only w
        push!(dwA, dwAtemp)
end
dwG = reduce(hcat,dwA)

@show dwh,dwq,dwG

########Trying out DiffOpt

MOI.set(
    model,
    DiffOpt.ForwardConstraintFunction(),
    cons[1],
    0.0 * index(x[1]) - 1.0,  # to indicate the direction vector to get directional derivatives
)

DiffOpt.forward_differentiate!(model)
dxh = MOI.get.(model, DiffOpt.ForwardVariablePrimal(), x)

# MOI.set(
#     model,
#     DiffOpt.ForwardObjectiveFunction(),
#     x[1]
# )

# DiffOpt.forward_differentiate!(model)
# dxc1 = MOI.get.(model, DiffOpt.ForwardVariablePrimal(), x)

# MOI.set(
#     model,
#     DiffOpt.ForwardObjectiveFunction(),
#     x[2]
# )

# DiffOpt.forward_differentiate!(model)
# dxc2 = MOI.get.(model, DiffOpt.ForwardVariablePrimal(), x)

# dxq = hcat(dxc1,dxc2)

# MOI.set(
#     model,
#     DiffOpt.ForwardConstraintFunction(),
#     cons[1],
#     1.0 * index(x[1]) - 0.0  # to indicate the direction vector to get directional derivatives
# )
# DiffOpt.forward_differentiate!(model)
# dxA1 = MOI.get.(model, DiffOpt.ForwardVariablePrimal(), x)

# MOI.set(
#     model,
#     DiffOpt.ForwardConstraintFunction(),
#     cons[1],
#     1.0 * index(x[2]) - 0.0  # to indicate the direction vector to get directional derivatives
# )
# DiffOpt.forward_differentiate!(model)
# dxA2 = MOI.get.(model, DiffOpt.ForwardVariablePrimal(), x)

# dxG = hcat(dxA1,dxA2)

@show dxh,dxq,dxG

################From Boot (1963)

dxiq=-(Q^(-1)-Q^(-1)*G'*(G*Q^(-1)*G')^(-1)*G*Q^(-1))
dxih=Q^(-1)*G'*(G*Q^(-1)*G')^(-1)
vh = (G*Q^(-1)*G')^(-1)*(G*Q^(-1)*(-q)-h)
dxiA1=-result[1]*Q^(-1)*G'*(G*Q^(-1)*G')^(-1)-vh[1]*-(Q^(-1)-Q^(-1)*G'*(G*Q^(-1)*G')^(-1)*G*Q^(-1))[:,1]
dxiA2=-result[2]*Q^(-1)*G'*(G*Q^(-1)*G')^(-1)-vh[1]*-(Q^(-1)-Q^(-1)*G'*(G*Q^(-1)*G')^(-1)*G*Q^(-1))[:,2]
dxiG = hcat(dxiA1,dxiA2)

@show dxih,dxiq,dxiG;

@mbesancon this one screams your name

1 Like

A quick follow up - brute force finite differences seem to confirm the results from DiffOpt:
image

On to debug matrix multiplications…

Code:
######Brute-force finite differences

Gvector1 = range(0.9,1.1,step=0.01)
allData = []

for g in eachrow(Gvector1)

    n = 2 # variable dimension
    m = 1 # no of inequality constraints

    Q = [4.0 1.0; 1.0 2.0]
    q = [1.0; 1.0]
    G = [1.0 g]
    h = [-1.0]   # initial values set
    model = Model(() -> DiffOpt.diff_optimizer(Ipopt.Optimizer))
    set_silent(model)
    @variable(model, x[1:2])
    @constraint(model, cons[j in 1:1], sum(G[j, i] * x[i] for i in 1:2) <= h[j])

    @objective(
        model,
        Min,
        1 / 2 * sum(Q[j, i] * x[i] * x[j] for i in 1:2, j in 1:2) +
        sum(q[i] * x[i] for i in 1:2)
    )
    optimize!(model)

    push!(allData, value.(x))

end

diffV1 = diff(collect(Gvector1))
diffV2 = reduce(hcat,diff(vec(allData)))'

plot(Gvector1[2:end],diffV2[:,1]./diffV1,label="dx_1/dG_2")
plot!(Gvector1[2:end],diffV2[:,2]./diffV1,label="dx_2/dG_2")

scatter!(vec([1.0 1.0]),vec([-0.125,0.375]),label="From DiffOpt")

xlabel!("g_1,g_2")

ylabel!("Derivatives")
1 Like

Well, it’s difficult to argue with finite differences… The signs matter, especially when using the function dual from JuMP. For the future:

Code for Amos and Kolter (2017)
######This uses duals from JuMP which can be negative. 
######However, Amos and Kolter (2017) assume λ≥0, so
######we need to adjust the signs
######Duals from JuMP are non-positive for ≤ constraints
q = q
G = G
h = h
Q = Q

LHM = [
    Q -transpose(G)
    -Diagonal(dual.(cons))*G -Diagonal(G*value.(x)-h) 
]
LHMinv = inv(LHM)
k1 = length(LHM[:,1])
k2 = length(Diagonal(dual.(cons))[1,:])
RHMb = [zeros(k1-k2,k2)
-Diagonal(dual.(cons))] #Matrix(1.0I, size(h)[1])  
RHMc = [-Matrix(1.0I, 2,2)
zeros(k2,2)]

RHMc = [-Matrix(1.0I, 2,2)
zeros(k2,2)]

dallb = LHMinv*RHMb    ####both w and λ
dallc = LHMinv*RHMc    ####both w and λ

dwh = dallb[1:2,:] ####picking only w
dwq = dallc[1:2,:] ####picking only w

dAm = zeros(size(G))
dwA = []
for i = 1:2
        dAm = zeros(size(G))
        dAm[i] = 1.0 ######Do for every element of G
        RHMA = [
            transpose(dAm).* dual.(cons)
            Diagonal(dual.(cons)).* dAm* value.(x)
        ]
        dallAtemp = LHMinv * RHMA
        dwAtemp=dallAtemp[1:2,1:end] ####picking only w
        push!(dwA, dwAtemp)
end
dwG = reduce(hcat,dwA)

@show dwh,dwq,dwG

Code for Boot (1963)
x_s = Q^(-1)*(-q)-Q^(-1)*G'*(G*Q^(-1)*G')^(-1)*(G*Q^(-1)*(-q)-h)

dxih=Q^(-1)*G'*(G*Q^(-1)*G')^(-1)
dxiq=(Q^(-1)-dxih*G*Q^(-1))
vh = (G*Q^(-1)*G')^(-1)*(G*Q^(-1)*(-q)-h)
dxiA1=-x_s[1]*dxih-vh[1]*dxiq[:,1] ####We use the expressions as they are derived by the authors
dxiA2=-x_s[2]*dxih-vh[1]*dxiq[:,2]
dxiG = hcat(dxiA1,dxiA2)

dxiq = -dxiq #####But we know that a=-q so we need to switch signs

@show dxih,dxiq,dxiG;

1 Like