Unconstrained Optimization
Automatic differentiation is amazing. But models in my field, economics, often include nested optimization problems. A simple example might look like this, where x is a vector or scalar.
Unfortunately, when part of your objective function f is computed as a optimum, straightforward forward differentiation doesn’t work! This is where the envelope theorem comes in. By the envelope theorem,
so that
where = z^*(x) = \text{arg}\min_z h(x,z).
So how do we connect this back to automatic differentiation? I’ll be using Optim.jl for this. With each call of the objective function f(x), we first compute z^*(x) “offline” (i.e. not within the machinery of the forwarddiff), then call h(x, z^*(x) using that z^*(x), and the forwarddiff ends up computing the correct derivative. Here’s an example of what I mean:
using Optim
using ForwardDiff
function h(xz)
return xz[1]^2 + xz[2]^2 + (xz[1]*xz[2])^2
end
function f(x)
# Compute the minimizer (outside of the autodiff framework)
h_offline(z) = h(vcat(ForwardDiff.value.(x),z))
res = optimize(h_offline, [1.0], LBFGS(); autodiff=:forward)
z_star = Optim.minimizer(res)
# Return to the autodiff framework
return h(vcat(x,z_star))
end
optimize(f, [1.0], LBFGS(); autodiff=:forward)
What’s going on here? In the objective function f
, we first define an objective function h_offline
which strips all the autodiff stuff from x
using value.(x)
, then calls h
. We then use optimize
to find the minimizer z_star
= z^*(x). Then we reenter autodiff-land, and call the objective function h
using x
and the minimizer z_star
.
Note: ForwardDiff.jl works by replacing the input vector x::Vector{Float64}
with an input vector x::Vector{Dual}
, where Dual
is a data type that keeps track of its value, just like a number, as well as information about partial derivatives. The call value.(x)
recovers the original vector x::Vector{Float64}
of simple Float
s.
I also want to mention that this works with any combination of nested optimization loops. Several nested levels? Fine. Multiple z values separately computed as optima in a single f function? Fine. This procedure simply allows the correct derivative information to be passed on from a step computed using an optimization.
Constrained Optimization
So far, everything I’ve written has covered unconstrained optimization. Luckily, the “full” envelope theorem helps us with that as well. Suppose now that
and that you can solve for the minimizer of the above problem z^*(x) using your favorite constrained optimization library. Now, one more step is involved to get the derivative of f(x). First, we can rewrite the problem using a “Langrangian function”
Now the envelope theorem gives us that
where z^*(x) and \lambda^*(x) are the solution to \min_{z,\lambda}\mathcal{L}(z,\lambda,x). If you have z^*(x), then you can recover \lambda(x) using the property that z^*(x) minimizes the Lagrangian:
Now, the problem can be solved just as we did before. Compute z^*(x) and \lambda^*(x) “offline,” then call \mathcal{L}(z^*(x), \lambda^*(x), x) in place of f(x). I’ll omit the implementation of that here, but if there’s interest, I’m happy to write it up.
Where to go from here
As I cop to in the title, the implementation I propose is a bit hacky. Ideally, I suppose it would be good for Optim to support this natively. Something like (this is basically pseudocode):
function optimize(h, z0::Vector{Float64}, x::Vector{Dual}, args...; kwargs...)
h_offline(z) = h(ForwardDiff.value.(x), z)
res = optimize(h_offline , z0, args...; kwargs...) # Find the minimum "offline"
res.minimum = func(x, res.minimizer) # Put the derivative information in the minimum
return res
end
I would love to hear your thoughts and suggestions on this. If anybody would be interested in helping adapt this for Optim.jl or as a separate package, please let me know! I don’t have much experience contributing to open-source projects but I’d like to start.