# Automatic differentiation - ForwardDiff: components of gradient

Hello, I’m once again running into some limitation of my Julia skills

# Background

I’m using forward mode automated differentiation on a function F that maps a matrix A\in\mathbb{R}^{N \times N} to \mathbb{R} F(A):\mathbb{R}^{N \times N}\mapsto \mathbb{R}. Given the function and an input matrix, I’m able to compute the gradient \nabla F with respect to each element in A:

\nabla F = \left[ \begin{matrix} \dfrac{\partial F}{\partial a_{1,1}} & \dots & \dfrac{\partial F}{\partial a_{1,N}} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial F}{\partial a_{N,1}} & \dots & \dfrac{\partial F}{\partial a_{N,N}} \end{matrix} \right]

using ForwardDiff.gradient(F,A). Using ForwardDiff is faster than using the analytical expressions of the partial derivatives, even for small problems (N<100).

# Problem setting

Some numerical experiments for the actual application have shown that there are a lot of identical values in the matrix \nabla F - which can be (very) large - so I’d like to gain some computation time and memory by only computing the unique elements of \nabla F (and combine this with IndirectArrays).

Based on what I’ve read here, I made a MWE to calculate a specific \dfrac{\partial F}{\partial a_{i,j}} for a function F.

    using ForwardDiff
N = 100
# testfunction with result initiated within function
function F(x::AbstractArray{T}) where {T}
res = zero(eltype(x))
for i in 1:size(x,1)
for j in 1:size(x,2)
res += x[i,j]^(i/j)
end
end
return res
end

# location where we want the gradient
A = rand(N,N)

# computing the gradient (for all variables) > OK

# computing a specific component of the gradient > OK
inds = LinearIndices((N,N))
(t,s) = (1,3)  # value for which we are searching partial derivative: ∂foo/∂a_{t,s}
vv = inds[t,s] # associated linear index
subfoo = x -> F(reshape(vcat(A[1:vv-1], x, A[vv+1:end]), N,:))
res = ForwardDiff.derivative(subfoo, A[t,s])

@assert res == ∇foo[t,s]


This works, but I have the following questions:

1. Could this be made more efficient? Especially the definition of subfoo, where a new matrix needs to be defined for each (t,s). Maybe pre-allocate this in some way?
2. How could this implemented in a more generic way for different functions? Below I added an additional example, where I would like to avoid a lot of matrix generations, but this is not working (commented line)
	function bigfoo(t,s,foo::Function, A::AbstractArray, args...)
backup = copy(A[t,s]) # copy original value of A[t,s]

# closure - no new matrix (not working)
#localfoo = (x::Number,) -> foo(setindex!(A,x,t,s), args...)

# closure - new matrix (working, but builds a lot of matrices)
inds = LinearIndices((N,N))
(t,s) = (1,3)  # value for which we are searching partial derivative: ∂foo/∂a_{t,s}
vv = inds[t,s] # associated linear index
localfoo = (x::Number,) -> foo(reshape(vcat(A[1:vv-1], x, A[vv+1:end]), N,:), args...)

res = ForwardDiff.derivative(localfoo, A[t,s])
# restore original value (used to undo the setindex!)
A[t,s] = backup
return res
end

@assert	bigfoo(t,s, F, A) == ∇foo[t,s]

1. Finally, what would be an efficient manner to identify the values that will be identical in \nabla F? I’m currently thinking about using an analytically determined gradient (on a smaller scale) and expand the patterns of identical values to the scale of the real problem.

All the above was done in Julia 1.6.