Fast derivatives for tensor expressions using ForwardDiff

I want to evaluate and differentiate functions on a grid of points
that have an underlying tensor structure

g(x,y) = Lᵢ(xₖ) Aᵢⱼ Lⱼ(yₗ)

Here x = [x₁, …, xₙ] and y = [x₁, …, xₙ]

Here A is a n x n Matrix of coefficients and L(x) = [L₁(x) … Lₙ(x)]
and L(y) = [L₁(y) … Lₙ(y)] are sets of functions (e.g. Lagrange polynomials).

First consider evaluatation!

  1. evaluation of g(x,y) using matrix matrix products
    number of flops ∝ n³
  2. evaluation of g(xₖ,xₗ) for all points in x and y one by one
    number of flops ∝ n⁴

For algorithmic differentiation the flops counts are identical. It seems to me that the efficient version (1) is currently not possible in ForwardDiff

What would it take to add this kind of support?

I figured out I can use Dual numbers directly from the ForwardDiff package instead of using the API.

Let,

 duals_x(a,b,n) = [Dual(x,one(x),zero(x)) for x in LinRange(a,b,n)]
 duals_y(a,b,n) = [Dual(y,zero(y),one(y)) for y in LinRange(a,b,n)]

 A = repeat(Float64[1 2 3 4], 4, 1)
 B = repeat(Float64[1 2 3 4]', 1, 4)

 g(x,y) = (L(x) * A * L(y)') ./  (L(x) * B * L(y)')

I can differentiate through g(x,y) directly by calling g(x,y) using vectors of dual numbers as follows

 julia> x = duals_x(0.0,1.0,3);
 
 julia> y = duals_y(0.0,1.0,3);

 julia> g(x,y)
 3×3 Array{Dual{Nothing,Float64,2},2}:
 Dual{Nothing}(1.0,-3.0,3.0)      ...   Dual{Nothing} (4.0,-12.0,3.0)
 Dual{Nothing}(0.4,-0.48,1.2)     ...   Dual{Nothing}(1.6,-1.92,1.2)
 Dual{Nothing}(0.25,-0.1875,0.75) ...   Dual{Nothing}(1.0,-0.75,0.75)

This gives me both the function and the partials evaluated at the grid points.