ReverseDiff.jl with iterative solvers


#1

I am trying to find the derivative of a function (that has input as the converged values of Jacobi iterator), with respect to the elements of the matrices used in Jacobi iterations. For example, if f(x) is a function of x, which in turn comes from Ax = b, I am trying to find the derivative of f with respect to element of A and b.

I am using reverseDiff.jl and forwardDiff.jl for this purpose, but all I can get is derivative of f with respect to the converged x, even when I pass b as parameter in the GradientConfig and GradientTape functions.

I have done this in C++ using CoDiPack library, but no success in Julia. Is this even possible in Julia? Any help would be appreciated. Thanks.


#2

I’m not sure what you mean, but if x is the solution to Ax=b then you can compute analytically the derivative of x wrt A and b, which would probably be more efficient than reverse differentiation on the iteration. You can possibly even use more clever tricks like adjoint methods if you do this several times.


#3

I can’t find it analytically, reason being, it is an aerodynamic application. There is fine distribution of points that define the shape of aerofoil. An iterative solver solves some differential equations iteratively with these points as inputs (which are floating point numbers) to get coefficients of drag and lift, and the sensitivities of these coefficients are computed with respect to each point on the surface of aerofoil.

I was using Jacobi as a test case, in which case, A and b would comprise of floating point numbers. Finite differences does not give resuts with desired precision.

It works with the C++ library I am using, just curious if it can be done in Julia.


#4

Your question is not very clear. Providing a MWE may help.


#5

I am sorry for not being clear.
If someone can shed some light on how to find derivative of f(x(b)) wrt b, with doing anything analytically.
All I get is derivative of f(b) wrt b when using :
b = 2
x = b^2 + b;
f(x) = sin(x) + x^2;
result = ForwardDiff.derivative(f,b)


#6

Chain rule?


#7

Possibly

using ForwardDiff
f(x) = sin(x) + x^2;
g(b) = b^2 + b;
b = 2
result = ForwardDiff.derivative(f ∘ g, b)

?


#8

The symbol is throwing an error : ERROR: LoadError: UndefVarError: ∘ not defined.
Am I missing something?

Anyways, here the code that I am actually trying to run, may give some insight :

using IterativeSolvers
using ReverseDiff: GradientConfig, GradientTape

A  =     [10.0 -1.0 2.0 0.0;
         -1.0 11.0 -1.0 3.0;
          2.0 -1.0 10.0 -1.0; 
          0.0 3.0 -1.0 8.0]
b = [6.0 ,25.0 ,-11.0 ,15.0]
x = similar(b)
f(x) = (x[1]^3) + log10((x[2]^2)) + exp(x[3]) + 4*x[4]
cfg = GradientConfig(b)
tape = GradientTape(f, b, cfg)

#x = jacobi(A, b)

jacobi!(x, A, b)

print(x)
println()
print(f(x))
println()

result = similar(x)
# ReverseDiff.gradient!(result, f, x, cfg)
# print(result)
# println()
ReverseDiff.gradient!(result, tape, b)
print(result)
println()

It gives derivative of f(b), not f(x(b)) with respect to b.
Thanks.


#9

The composition symbol is defined only on Julia 0.6. You can use x -> f(g(x)) instead.


#10

Here is the same thing in forward mode :

using IterativeSolvers
using ForwardDiff: GradientConfig, gradient

A  =  [  10.0  -1.0  2.0  0.0;
         -1.0  11.0 -1.0  3.0;
          2.0  -1.0 10.0 -1.0; 
          0.0   3.0 -1.0  8.0 ]
b = [6.0, 25.0,  -11.0, 15.0]
x = [0,0,0,0]
f(x) = (x[1]^3) + log10((x[2]^2)) + exp(x[3]) + 4*x[4]
#cfg = GradientConfig(b)

x = jacobi(A,b)

print(x)
println()
print(f(x))
println()

result = ForwardDiff.gradient(f,b)
#result = ForwardDiff.derivative(f,b[1])

print(result)
println()



#11

Independently from the technical aspects of getting this to work, my point was that if you want to compute the derivative of f with respect to b in this way, you have to compute the derivative of x with respect to b as an intermediate step, which is expensive. But because x solves a linear system, the derivative simplifies. Ie let F(b) = f(x(b)), you want to compute the gradient of F, which is J^T grad f, where J = A^-1 is the jacobian of x wrt b. Therefore, the correct way to compute this is to compute grad f, and solve the linear system A^T grad F = grad f by an iterative method.


#12

Note that this is called the the adjoint system of equations. The adjoint method is the same basic idea as reverse-mode auto-differentiation, but when you are using an iterative solver often auto-differentiation tools don’t work well (unless they know specifically about the solver being used), and then you have to apply the adjoint method “manually”.


#13

Thanks for the very nice write-up! It’s a useful trick that many people and communities use in some form or other and everybody should know about but is not often covered in a general setting in textbooks (Schur complements seem to be another instance of that)


#14

ForwardDiff/ReverseDiff differentiates native Julia functions at the input(s) you provide. When you call ReverseDiff.gradient(g, v) for some function g and some array v, you’re saying "return the gradient of g evaluated at v".

Correct! It’s evaluating the gradient of f at the value b.

As @Tamas_Papp mentioned, it seems like you want to differentiate through some intermediary function that you’re not actually calling. Note that x (as you’ve defined it) isn’t a Julia function, it’s a Vector{Float64}.

Is the below closer to what you’d like to do?

using IterativeSolvers, ReverseDiff 

A = [10.0 -1.0 2.0 0.0;
     -1.0 11.0 -1.0 3.0;
      2.0 -1.0 10.0 -1.0; 
      0.0 3.0 -1.0 8.0]
b = [6.0 ,25.0 ,-11.0 ,15.0]

f(x) = (x[1]^3) + log10((x[2]^2)) + exp(x[3]) + 4*x[4]

# this is the function you seem to actually want to differentiate
g(A, b) = f(jacobi(A, b))

# get gradients w.r.t. `A` and `b`
dA, db = ReverseDiff.gradient(g, (A, b))

You can, of course, preallocate memory/prerecord the tape etc. to make the above more efficient, but before we go that far, I’m just making sure that this is closer to your intention.


#15

Sorry for the late reply.
This is actually what I wanted to do. Thanks for clearing it out. I was also able to do it slightly differently:

using IterativeSolvers
using ReverseDiff: GradientConfig, GradientTape

A  =     [10.0 -1.0 2.0 0.0;
         -1.0 11.0 -1.0 3.0;
          2.0 -1.0 10.0 -1.0; 
          0.0 3.0 -1.0 8.0]
b = [6.0 ,25.0 ,-11.0 ,15.0]
x = similar(b)
f(x) = (x[1]^3) + log10((x[2]^2)) + exp(x[3]) + 4*x[4]
cfg = GradientConfig(b)
tape = GradientTape(b -> f(jacobi(A,b)), b, cfg)

#x = jacobi(A, b)

jacobi!(x, A, b)

result = similar(x)

ReverseDiff.gradient!(result, tape, b)

Thanks everyone for helping out.