Differentiating through optimization problem

Hello all,
I am very new to Julia, and I intend to migrate from Matlab to Julia. I was very intrigued with automatic differentiation, and worked out a few problems by hand, to understand the concepts behind reverse mode and forward mode AD. But in all these, it seems that you need to have a function in hand (say f(x,y) = x^2+x*y).

I have the following issue though, I have an equation lc, which is a function of six variables: px, py, pz, phi, theta, psi. Here, the variables phi, px, py and pz are dependent on psi and theta, but I do not have their mathematical expressions. Is it possible to implement reverse AD to find the derivatives of lc, with respect to psi and theta? How to do so in enzyme? I profusely apologize for the lack of MWE.

1 Like

What you are looking for is called implicit differentiation: differentiating through a function which is given “implicitly”, by an equation.
I have implemented this in Julia with the package ImplicitDifferentiation.jl, I encourage you to take a look and see if you manage to use it. If not, post some self-contained code on here with your equation and the solver you want to apply, and I’ll help you!
Note that my package is not yet compatible with Enzyme.jl, so you will have to perform the differentiation with Zygote.jl or ForwardDiff.jl for now.

1 Like

Hello, thank you very much for replying!. I shall go through the package ImplicitDifferentiation.jl and let me see how much I can progress. I will get back with my progress.
Thanks again, for your support!

1 Like

Hello again, I have gone through the package ImplicitDifferentiation.jl, and I am not quite sure if this is what I require, for my problem. The function that I want to differentiate is:

Here, only the numerical values of px,py,pz and phi are known, and are obtained from a blackbox. Their relationship to th and psi are not known. The numerical value of lc, on a grid of th,psi is depicted in Fig.1.

The values of [th,psi,phi,px,py,pz,lc], for the second row, (\psi=0) from left to right is:

[-0.0174533 0.0 -4.55313e-17 6.72838e-15 1.33124 152.659 151.182; 0.0 0.0 3.1262e-17 1.641e-14 2.84933e-14 152.665 152.665; 0.0174533 0.0 -6.9987e-17 -1.32915e-15 -1.33323 152.659 154.149]

I wish to find \frac{\partial lc}{\partial\theta}, at [\theta,\psi]=[0,0]. Numerically, I can use the central difference differentiation. \frac{\partial lc}{\partial\theta}=\frac{lc(\theta=\frac{\pi}{180},\psi=0)-lc(\theta=\frac{-\pi}{180},\psi=0)}{\frac{2\pi}{180}} = (154.149-151.182)/(2*pi/180) = 84.9983.

Is there any possibility that I can use ImplicitDifferentiation to determine the partial derivatives of lc, with respect to \theta and \psi, at any point in the 2D grid? I would deeply appreciate any insights or pointers in this.
-Thank you very much!

I had misunderstood your problem. When you said

I thought you meant that the dependence of (\phi, p_x, p_y, p_z) on (\psi, \theta) was defined by an equation \mathrm{lc}(\phi, p_x, p_y, p_z, \psi, \theta) = 0.

But apparently that’s not the case. Instead, you want to differentiate \mathrm{lc}(\phi(\psi, \theta), p_x(\psi, \theta), p_y(\psi, \theta), p_z(\psi, \theta), \psi, \theta) with respect to \theta, but you only have access to the functions \phi and friends through a “black box”, which you call on a grid.
What does this black box look like? Is it coded in Julia? Can you call it outside of the grid?

The \phi and friends are obtained from a least squares optimizer, which I consider as a black box. The variables \theta and \psi goes into the black box that return \phi,~p_x,~p_y,~p_z. Only their numerical values are available, which are to be substituted into the expression lc.

Edit 1:

Can you call it outside of the grid?

I am not sure if I correctly understood the question, but the numerical values of the variables: \phi,~p_x,~p_y,~p_z are available on demand.

That is interesting, because you can apply implicit differentiation there. Can you give me the formula for the least squares problem?

In which programming language is the code that produces them? Is it a Julia code?

Thank you for your continuous inputs. The vector that I pass into LeastSquaresOptim has 24 very long entries, which I think would be very difficult to be displayed here. This is basically an inverse kinematics problem of a mechanism, where the two input parameters are \theta and \psi, and the remaining 4 parameters \phi,~p_x,~p_y,~p_z are solved by minimizing the least squares of the constraint equations.
Your suggestion to do the implicit differentiation in the blackbox seem interesting, as the general nature of the constraint equations are of the form:
\Gamma_i(\beta_1(\theta,\psi),\beta_2(\theta,\psi)\cdots\beta_{11}(\theta,\psi)\cdots p_x(\theta,\psi),p_y(\theta,\psi),p_z(\theta,\psi),\phi(\theta,\psi))=0 \quad i=1:24
Upon giving the parameters \theta,~\psi and the initial conditions for the 15 variables in the function, the least-squares optimizer would return all the 15 parameters, out of which I use p_x(\theta,\psi),p_y(\theta,\psi),p_z(\theta,\psi),\phi(\theta,\psi) and \theta,\psi to compute the function lc.

I didn’t know your least squares solver was a Julia package! In that case, and since you only have very few parameters, I would recommend direct automatic differentiation of the solver in forward mode. It won’t be as efficient as implicit differentiation, but it will be 100x easier to code. ForwardDiff.jl is the package that is most likely to work here (but Enzyme.jl might too). If you have a function

f([theta, psi]) -> [phi, px, py, pz]  # calls the least squares solver under the hood

can you just try

using ForwardDiff
ForwardDiff.jacobian(f, [theta, psi])

and see what it does?


If you rewrite the problem using NonlinearLeastSquaresProblem from NonlinearSolve, forwarddiff does have the correct rules for it. Direct AD of nonlinear systems tends to have very little to no benefits.


That is of course an even better solution

Actually no (as long as by “having a function in hand” you mean having a symbolic prescription for the function). As is now perhaps obvious from the other responses, AD can differentiate a code! Details of software implementation matter, that is why they asked you if your functions were implemented in Julia. But in principle, yes, you can find a derivative of an outcome from some numerical solver with respect to its inputs even if you do not have a formula.


And to sum up, for numerical solvers, there are two main ways to do this:

  • the direct way (piggyback autodiff through the solver)
  • the implicit way (exploit what we know about the output)

The implicit way, when available, is the clever one, and here I see no reason why it wouldn’t work with @avikpal’s suggestion


I didn’t know your least squares solver was a Julia package!

Apologies for missing this information a miss in my earlier posts. I have a bit of a difficulty in understanding what you meant by:
f([theta, psi]) -> [phi, px, py, pz] # calls the least squares solver under the hood can you just try
Did you mean the function lc whose partial derivatives are required to be found out? Is it correct if I implement the following?

lc = (theta,psi) -> [phi,px,py,pz] #calling the least square solver under the hood
using ForwardDiff

@zdenek_hurak Thank you for your valuable insights, as someone who recently stumbled into the concept of automatic differentiation, knowing that I can still find the derivative of what a numerical solver returns, with respect to its inputs is a revelation. I am yet to figure out how to get this implemented.

Thanks for your reply, which is quite encouraging for me!. I however am having a hard time figuring out how to implement this. I think I understand the first part, which is to rewrite the problem from LeastSquaresOptim, to NonLinearLeastSquaresProblem. After this, how can I use forwarddiff to evaluate \frac{\partial lc}{\partial\theta} and \frac{\partial lc}{\partial\theta}? I would be deeply grateful if you ould share a rough syntax. I still cannot understand how to wrap the solver and pass it into forwarddiff, to obtain the derivative/jacobian.

See section 2.3 https://arxiv.org/pdf/2403.16341#page=7.12. Once you rewrite the problem, you can also use LeastSquaresOptim using LeastSquaresOptim.jl · NonlinearSolve.jl

1 Like

@avikpal , Thank you very much for sharing!. I can work with this. I shall report back later with my experiences.
@gdalle, I am curious to try the implicit way as well to compare the efficiency. However, I am also a bit concerned as you say it is 100x easier to go with AD. If possible, could you share a rough skeleton on how to implement it?

The implicit/clever way is exactly what @avikpal suggests: taking advantage of our knowledge that the output is a solution to a nonlinear system of equations. My package ImplicitDifferentiation.jl would allow you to do it manually because it is more generic, but NonlinearSolve.jl does it automatically for the special case you need here. It contains custom differentiation rules that are essentially equivalent to the implicit function theorem.

1 Like