Forward mode Autodiff for Matrix multiplication

Hi, I’m new to Julia and looking to learn more about Autodiff in julia. I’ve matrix multiplication and want to get the derivative of a matrix wrt to a matrix.

For example in the equations below, I want to compute \frac{\partial X_4}{\partial X_1}, where both X_1 and X_4 are matrices (W_i are also matrices and f is some function applied to each element of the matrix) .

X_1 = X_0 \cdot W_1
X_2 = f(X_1)
X_3 = X_2 \cdot W_2
X_4 = f(X_3)

This is very similar to NN, but here my output is not a scalar. I want to use forward mode autodiff for this however I can’t figure out how to do this using ForwardDiff package.

Apart from this, it would be great if someone can explain how chain rule would work in this case (I’m familiar with reverse mode autodiff where output is scalar but I can’t figure out the mathematics behind something similar but in forward mode).

I hope I’ve explained my problem clearly.

Thanks

julia> function foo(X₀, W₁, W₂)
           X₁ = X₀ * W₁
           X₂ = sin.(X₁)
           X₃ = X₂ * W₂
           X₄ = sin.(X₃)
       end

julia> let W₁=randn(10, 10), W₂=randn(10, 10), X₀=randn(10, 10)
       ForwardDiff.jacobian(X₀->foo(X₀, W₁, W₂), X₀)
       end

I find it easiest to imagine adding a small perturbation \delta X_0 to the input matrix, and carrying it through to first order:

X_1 + \delta X_1 = X_0 W_1 + \delta X_0 W_1 \\ X_2 + \delta X_2 = f(X_1 + \delta X_1) = f(X_1) + f'(X_1) \odot \delta X_1 + O(\delta X_0^2)\\ X_3 + \delta X_3 = X_2 W_2 + (f'(X_1) \odot \delta X_1) W_2 + O(\delta X_0^2) \\ X_4 + \delta X_4 = f(X_3 + \delta X_3) = f(X_3) + f'(X_3) \odot [(f'(X_1) \odot (\delta X_0 W_1)) W_2] + O(\delta X_0^2)

where \odot denotes the elementwise (Hadamard) product.

Now, the question is, when you ask for the derivative (Jacobian) of the output, what do you want? If you want the directional derivative \delta X_4 in a certain input direction \delta X_0, you can directly read it off above. If you want the Jacobian “matrix”, in some sense it’s “really” a rank-4 tensor (4d array). ForwardDiff gives you the Jacobian assuming the input and output are “flattened” using vec. In either case you need to do a bit of multilinear algebra to work it out by hand.

1 Like

In my problem, X_0 is n \times 2 and X_4 is n \times 1, so what I want is \frac{\partial X_4(i)}{\partial X_0(i,1)} and \frac{\partial X_4(i)}{\partial X_0(i,2)}. I believe both of these would be n \times 1 vectors (here i=1:n).