[Example] Use automatic differentiation to find Lagrange multiplier

Hi there, this post is about a toy example that explores whether and how automatic differentiation (AD) can be used to determine Lagrange multipliers.


Take the following toy problem

f(\varepsilon) = \underset{x_1, x_2}{\min} \, = x_1^2 + x_2^2 \quad \text{subject to} \quad x_1 + x_2 = \varepsilon,

where f(\varepsilon) is the optimal value function. A bit of manipulation (and standard texbook results) yield

f(\varepsilon) = \frac{\varepsilon^2}{2}, \quad f^\prime(\varepsilon) = \varepsilon = - \lambda,

where \lambda is the Lagrange multiplier w.r.t. the equality constraint.

Code that works

The following piece of code works just fine

using Zygote

objective_function(x) = sum(x.^2)

function kkt_solution(ε)
    [2 0 1; 0 2 1; 1 1 0] \ [0; 0; ε]

function optimal_value(ε)
	z = kkt_solution(ε)

lambda_value(ε) = kkt_solution(ε)[3]
julia> ε = 1.23

julia> optimal_value(ε) == ε^2/2

julia> optimal_value'(ε) == ε == -lambda_value(ε)

Code that does not work

So far, so good. Usually, the KKT system is not a simple linear system of equations, but a generic system of nonlinear algebraic equations

h(x, \lambda) = 0.

Of course, there are Julia packages that are able to solve these. But can we use these in conjunction with AD?

using Zygote, NLsolve

objective_function(x) = sum(x.^2)

function optimal_value_nlsolve(ε)
	function f!(F, z)
			F[1:3] = [2 0 1; 0 2 1; 1 1 0]*z - [0; 0; ε]
	res = nlsolve(f!, ε*ones(3))
julia> ε = 1.23

julia> isapprox(optimal_value_nlsolve(ε), ε^2/2)

julia> optimal_value_nlsolve'(ε)
ERROR: TypeError: in ccall: first argument not a pointer or valid constant expression, expected Ptr, got Tuple{Symbol,String}


I am fully aware that what I’m asking for is a lot (really, a lot!), but: is it possible to use AD together with generic iterative methods? If so, what’s the way to go?

Thanks heaps!

In theory, yes. In practice… more or less :] See https://github.com/JuliaNLSolvers/NLsolve.jl/issues/205 for some comments and a working prototype.

The simplest way is to derive it yourself, by differentiating h(x,lambda,eps) = 0 wrt eps. Otherwise, if you use a constraint optimization solver, it should give you its lagrange multipliers as output.

1 Like

Thanks for the pointer to the open issue. Always reassurring to hear that one’s ideas are not totally insane.

And yes, clearly the multipliers are (usually) provided by the solvers. The example is purely pedagogical.