Automatic differentiation - ForwardDiff: components of gradient

Hello, I’m once again running into some limitation of my Julia skills

Background

I’m using forward mode automated differentiation on a function F that maps a matrix A\in\mathbb{R}^{N \times N} to \mathbb{R} F(A):\mathbb{R}^{N \times N}\mapsto \mathbb{R}. Given the function and an input matrix, I’m able to compute the gradient \nabla F with respect to each element in A:

\nabla F = \left[ \begin{matrix} \dfrac{\partial F}{\partial a_{1,1}} & \dots & \dfrac{\partial F}{\partial a_{1,N}} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial F}{\partial a_{N,1}} & \dots & \dfrac{\partial F}{\partial a_{N,N}} \end{matrix} \right]

using ForwardDiff.gradient(F,A). Using ForwardDiff is faster than using the analytical expressions of the partial derivatives, even for small problems (N<100).

Problem setting

Some numerical experiments for the actual application have shown that there are a lot of identical values in the matrix \nabla F - which can be (very) large - so I’d like to gain some computation time and memory by only computing the unique elements of \nabla F (and combine this with IndirectArrays).

Based on what I’ve read here, I made a MWE to calculate a specific \dfrac{\partial F}{\partial a_{i,j}} for a function F.

    using ForwardDiff
	N = 100
	# testfunction with result initiated within function
	function F(x::AbstractArray{T}) where {T}
		res = zero(eltype(x))
		for i in 1:size(x,1)
			for j in 1:size(x,2)
				res += x[i,j]^(i/j)
			end
		end
		return res
	end

	# location where we want the gradient
	A = rand(N,N)
	
	# computing the gradient (for all variables) > OK
	∇foo = ForwardDiff.gradient(F, A)
	
	# computing a specific component of the gradient > OK
	inds = LinearIndices((N,N))
	(t,s) = (1,3)  # value for which we are searching partial derivative: ∂foo/∂a_{t,s}
	vv = inds[t,s] # associated linear index
	subfoo = x -> F(reshape(vcat(A[1:vv-1], x, A[vv+1:end]), N,:))
	res = ForwardDiff.derivative(subfoo, A[t,s])
	
	@assert res == ∇foo[t,s]

This works, but I have the following questions:

  1. Could this be made more efficient? Especially the definition of subfoo, where a new matrix needs to be defined for each (t,s). Maybe pre-allocate this in some way?
  2. How could this implemented in a more generic way for different functions? Below I added an additional example, where I would like to avoid a lot of matrix generations, but this is not working (commented line)
	function bigfoo(t,s,foo::Function, A::AbstractArray, args...)
		backup = copy(A[t,s]) # copy original value of A[t,s]
		
		# closure - no new matrix (not working)
		#localfoo = (x::Number,) -> foo(setindex!(A,x,t,s), args...)
		
		# closure - new matrix (working, but builds a lot of matrices)
        inds = LinearIndices((N,N))
	    (t,s) = (1,3)  # value for which we are searching partial derivative: ∂foo/∂a_{t,s}
	    vv = inds[t,s] # associated linear index
		localfoo = (x::Number,) -> foo(reshape(vcat(A[1:vv-1], x, A[vv+1:end]), N,:), args...) 
	
		res = ForwardDiff.derivative(localfoo, A[t,s])
		# restore original value (used to undo the setindex!)
		A[t,s] = backup
		return res
	end
	
	@assert	bigfoo(t,s, F, A) == ∇foo[t,s]
  1. Finally, what would be an efficient manner to identify the values that will be identical in \nabla F? I’m currently thinking about using an analytically determined gradient (on a smaller scale) and expand the patterns of identical values to the scale of the real problem.

All the above was done in Julia 1.6.