Forward mode Autodiff for Matrix multiplication

tgautam03 · November 24, 2021, 9:04pm

Hi, I’m new to Julia and looking to learn more about Autodiff in julia. I’ve matrix multiplication and want to get the derivative of a matrix wrt to a matrix.

For example in the equations below, I want to compute \frac{\partial X_4}{\partial X_1}, where both X_1 and X_4 are matrices (W_i are also matrices and f is some function applied to each element of the matrix) .

X_1 = X_0 \cdot W_1
X_2 = f(X_1)
X_3 = X_2 \cdot W_2
X_4 = f(X_3)

This is very similar to NN, but here my output is not a scalar. I want to use forward mode autodiff for this however I can’t figure out how to do this using ForwardDiff package.

Apart from this, it would be great if someone can explain how chain rule would work in this case (I’m familiar with reverse mode autodiff where output is scalar but I can’t figure out the mathematics behind something similar but in forward mode).

I hope I’ve explained my problem clearly.

Thanks

Keno · November 24, 2021, 9:45pm

julia> function foo(X₀, W₁, W₂)
           X₁ = X₀ * W₁
           X₂ = sin.(X₁)
           X₃ = X₂ * W₂
           X₄ = sin.(X₃)
       end

julia> let W₁=randn(10, 10), W₂=randn(10, 10), X₀=randn(10, 10)
       ForwardDiff.jacobian(X₀->foo(X₀, W₁, W₂), X₀)
       end

stevengj · November 24, 2021, 10:06pm

I find it easiest to imagine adding a small perturbation \delta X_0 to the input matrix, and carrying it through to first order:

X_1 + \delta X_1 = X_0 W_1 + \delta X_0 W_1 \\ X_2 + \delta X_2 = f(X_1 + \delta X_1) = f(X_1) + f'(X_1) \odot \delta X_1 + O(\delta X_0^2)\\ X_3 + \delta X_3 = X_2 W_2 + (f'(X_1) \odot \delta X_1) W_2 + O(\delta X_0^2) \\ X_4 + \delta X_4 = f(X_3 + \delta X_3) = f(X_3) + f'(X_3) \odot [(f'(X_1) \odot (\delta X_0 W_1)) W_2] + O(\delta X_0^2)

where \odot denotes the elementwise (Hadamard) product.

Now, the question is, when you ask for the derivative (Jacobian) of the output, what do you want? If you want the directional derivative \delta X_4 in a certain input direction \delta X_0, you can directly read it off above. If you want the Jacobian “matrix”, in some sense it’s “really” a rank-4 tensor (4d array). ForwardDiff gives you the Jacobian assuming the input and output are “flattened” using vec. In either case you need to do a bit of multilinear algebra to work it out by hand.

tgautam03 · November 25, 2021, 12:44am

In my problem, X_0 is n \times 2 and X_4 is n \times 1, so what I want is \frac{\partial X_4(i)}{\partial X_0(i,1)} and \frac{\partial X_4(i)}{\partial X_0(i,2)}. I believe both of these would be n \times 1 vectors (here i=1:n).

Topic		Replies	Views
"static" autodiff New to Julia	13	1399	February 10, 2023
Automatic differentiation - ForwardDiff: components of gradient General Usage question , autodiff	1	728	September 21, 2021
Eigenvalue sensitivites using autodiff Numerics eigenvalues , autodiff	5	811	May 2, 2022
Autodiff working for my model function but not for my loss function New to Julia question , forwarddiff , autodiff	4	105	July 31, 2024
Applying Reverse-Mode AD to ForwardDiff Derivatives/Gradients New to Julia flux , forwarddiff , chainrulescore	0	367	May 3, 2022

Forward mode Autodiff for Matrix multiplication

Related topics