ReverseDiff.jl with iterative solvers

shubhamRanjan · June 16, 2017, 8:39am

I am trying to find the derivative of a function (that has input as the converged values of Jacobi iterator), with respect to the elements of the matrices used in Jacobi iterations. For example, if f(x) is a function of x, which in turn comes from Ax = b, I am trying to find the derivative of f with respect to element of A and b.

I am using reverseDiff.jl and forwardDiff.jl for this purpose, but all I can get is derivative of f with respect to the converged x, even when I pass b as parameter in the GradientConfig and GradientTape functions.

I have done this in C++ using CoDiPack library, but no success in Julia. Is this even possible in Julia? Any help would be appreciated. Thanks.

antoine-levitt · June 16, 2017, 9:41am

I’m not sure what you mean, but if x is the solution to Ax=b then you can compute analytically the derivative of x wrt A and b, which would probably be more efficient than reverse differentiation on the iteration. You can possibly even use more clever tricks like adjoint methods if you do this several times.

shubhamRanjan · June 16, 2017, 12:16pm

I can’t find it analytically, reason being, it is an aerodynamic application. There is fine distribution of points that define the shape of aerofoil. An iterative solver solves some differential equations iteratively with these points as inputs (which are floating point numbers) to get coefficients of drag and lift, and the sensitivities of these coefficients are computed with respect to each point on the surface of aerofoil.

I was using Jacobi as a test case, in which case, A and b would comprise of floating point numbers. Finite differences does not give resuts with desired precision.

It works with the C++ library I am using, just curious if it can be done in Julia.

Tamas_Papp · June 16, 2017, 12:33pm

Your question is not very clear. Providing a MWE may help.

shubhamRanjan · June 16, 2017, 12:53pm

I am sorry for not being clear.
If someone can shed some light on how to find derivative of f(x(b)) wrt b, with doing anything analytically.
All I get is derivative of f(b) wrt b when using :
b = 2
x = b^2 + b;
f(x) = sin(x) + x^2;
result = ForwardDiff.derivative(f,b)

PetrKryslUCSD · June 16, 2017, 1:36pm

Chain rule?

Tamas_Papp · June 16, 2017, 1:39pm

Possibly

using ForwardDiff
f(x) = sin(x) + x^2;
g(b) = b^2 + b;
b = 2
result = ForwardDiff.derivative(f ∘ g, b)

?

shubhamRanjan · June 16, 2017, 2:51pm

The symbol is throwing an error : ERROR: LoadError: UndefVarError: ∘ not defined.
Am I missing something?

Anyways, here the code that I am actually trying to run, may give some insight :

using IterativeSolvers
using ReverseDiff: GradientConfig, GradientTape

A  =     [10.0 -1.0 2.0 0.0;
         -1.0 11.0 -1.0 3.0;
          2.0 -1.0 10.0 -1.0; 
          0.0 3.0 -1.0 8.0]
b = [6.0 ,25.0 ,-11.0 ,15.0]
x = similar(b)
f(x) = (x[1]^3) + log10((x[2]^2)) + exp(x[3]) + 4*x[4]
cfg = GradientConfig(b)
tape = GradientTape(f, b, cfg)

#x = jacobi(A, b)

jacobi!(x, A, b)

print(x)
println()
print(f(x))
println()

result = similar(x)
# ReverseDiff.gradient!(result, f, x, cfg)
# print(result)
# println()
ReverseDiff.gradient!(result, tape, b)
print(result)
println()

It gives derivative of f(b), not f(x(b)) with respect to b.
Thanks.

dpsanders · June 16, 2017, 2:54pm

The composition symbol is defined only on Julia 0.6. You can use x → f(g(x)) instead.

shubhamRanjan · June 16, 2017, 3:17pm

Here is the same thing in forward mode :

using IterativeSolvers
using ForwardDiff: GradientConfig, gradient

A  =  [  10.0  -1.0  2.0  0.0;
         -1.0  11.0 -1.0  3.0;
          2.0  -1.0 10.0 -1.0; 
          0.0   3.0 -1.0  8.0 ]
b = [6.0, 25.0,  -11.0, 15.0]
x = [0,0,0,0]
f(x) = (x[1]^3) + log10((x[2]^2)) + exp(x[3]) + 4*x[4]
#cfg = GradientConfig(b)

x = jacobi(A,b)

print(x)
println()
print(f(x))
println()

result = ForwardDiff.gradient(f,b)
#result = ForwardDiff.derivative(f,b[1])

print(result)
println()

antoine-levitt · June 16, 2017, 3:29pm

Independently from the technical aspects of getting this to work, my point was that if you want to compute the derivative of f with respect to b in this way, you have to compute the derivative of x with respect to b as an intermediate step, which is expensive. But because x solves a linear system, the derivative simplifies. Ie let F(b) = f(x(b)), you want to compute the gradient of F, which is J^T grad f, where J = A^-1 is the jacobian of x wrt b. Therefore, the correct way to compute this is to compute grad f, and solve the linear system A^T grad F = grad f by an iterative method.

stevengj · June 16, 2017, 3:59pm

Note that this is called the the adjoint system of equations. The adjoint method is the same basic idea as reverse-mode auto-differentiation, but when you are using an iterative solver often auto-differentiation tools don’t work well (unless they know specifically about the solver being used), and then you have to apply the adjoint method “manually”.

antoine-levitt · June 16, 2017, 4:32pm

Thanks for the very nice write-up! It’s a useful trick that many people and communities use in some form or other and everybody should know about but is not often covered in a general setting in textbooks (Schur complements seem to be another instance of that)

jrevels · June 16, 2017, 6:37pm

ForwardDiff/ReverseDiff differentiates native Julia functions at the input(s) you provide. When you call ReverseDiff.gradient(g, v) for some function g and some array v, you’re saying “return the gradient of g evaluated at v”.

Correct! It’s evaluating the gradient of f at the value b.

As @Tamas_Papp mentioned, it seems like you want to differentiate through some intermediary function that you’re not actually calling. Note that x (as you’ve defined it) isn’t a Julia function, it’s a Vector{Float64}.

Is the below closer to what you’d like to do?

using IterativeSolvers, ReverseDiff 

A = [10.0 -1.0 2.0 0.0;
     -1.0 11.0 -1.0 3.0;
      2.0 -1.0 10.0 -1.0; 
      0.0 3.0 -1.0 8.0]
b = [6.0 ,25.0 ,-11.0 ,15.0]

f(x) = (x[1]^3) + log10((x[2]^2)) + exp(x[3]) + 4*x[4]

# this is the function you seem to actually want to differentiate
g(A, b) = f(jacobi(A, b))

# get gradients w.r.t. `A` and `b`
dA, db = ReverseDiff.gradient(g, (A, b))

You can, of course, preallocate memory/prerecord the tape etc. to make the above more efficient, but before we go that far, I’m just making sure that this is closer to your intention.

shubhamRanjan · June 23, 2017, 12:42pm

Sorry for the late reply.
This is actually what I wanted to do. Thanks for clearing it out. I was also able to do it slightly differently:

using IterativeSolvers
using ReverseDiff: GradientConfig, GradientTape

A  =     [10.0 -1.0 2.0 0.0;
         -1.0 11.0 -1.0 3.0;
          2.0 -1.0 10.0 -1.0; 
          0.0 3.0 -1.0 8.0]
b = [6.0 ,25.0 ,-11.0 ,15.0]
x = similar(b)
f(x) = (x[1]^3) + log10((x[2]^2)) + exp(x[3]) + 4*x[4]
cfg = GradientConfig(b)
tape = GradientTape(b -> f(jacobi(A,b)), b, cfg)

#x = jacobi(A, b)

jacobi!(x, A, b)

result = similar(x)

ReverseDiff.gradient!(result, tape, b)

Thanks everyone for helping out.

Topic		Replies	Views
ReverseDiff.jl Community package , announcement	9	1678	December 13, 2017
Nesting ForwardDiff inside ReverseDiff? General Usage question , differentiation	5	1278	January 4, 2018
Gradiens of gradients using ReverseDiff Numerics	0	625	June 13, 2018
Finding Jacobian using automatic differentiation for a vector function General Usage forwarddiff , reversediff	3	1695	December 14, 2021
What functions might break ForwardDiff.jl? General Usage forwarddiff	10	2197	August 11, 2021

ReverseDiff.jl with iterative solvers

Related topics