Help with derivatives of matrix functions

Hi I have a matrix function G(x): \mathbb{R}^n \mapsto \mathbb{R}^{n\times n}, where x\in\mathbb{R}^n is a vector, and would like to use AD to compute derivatives \frac{\partial^2 G_{ij}(x)}{\partial x_i \partial x_j}, where G_{ij}(x) is the element of G(x).

This is how I am computing it:

∂(f, x, i) =  ForwardDiff.partials(f([ForwardDiff.Dual{}(x[j], float(i==j)) for j in eachindex(x)]))[]; 

The j loop seems wasteful and I get slightly slower results with the following terrible approach:

X = [ForwardDiff.hessian(z->G(z)[i,j],x) for i = 1:2, j=1:2]

and picking the derivatives I need.

Is there a better way to do this? Any help here would be greatly appreciated.

Thanks!

1 Like

Sounds like you want ForwardDiff.jacobian.

EDIT: I misread the question; jacobian is for first derivatives.

1 Like

I want second derivative, which is different for each element. I fixed my question to indicate \frac{\partial^2 G_{ij}(x)}{\partial x_i \partial x_j}.

Ideally you would like to construct a scalar-valued function out of G such that the hessian gives you the desired derivatives. Not sure how to do that though or if it’s even feasible

Here is how I’d do it:

using ForwardDiff
using BenchmarkTools

f(x) = x.^2 * transpose(x.^2) 

@show [ForwardDiff.hessian(x->f(x)[i, j], [3.0, 2.0, 1.0]) for i in 1:3, j in 1:3]
@show ForwardDiff.jacobian(x -> ForwardDiff.jacobian(f, x), [3.0, 2.0, 1.0])

@btime [ForwardDiff.hessian(x->f(x)[i, j], [3.0, 2.0, 1.0]) for i in 1:3, j in 1:3]
@btime ForwardDiff.jacobian(x -> ForwardDiff.jacobian(f, x), [3.0, 2.0, 1.0])

yielding

  8.800 μs (130 allocations: 39.62 KiB)
  2.256 μs (16 allocations: 5.41 KiB)

Interestingly enough the same implementation for ReverseDiff

ReverseDiff.jacobian(x -> ReverseDiff.jacobian(f, x), [3.0, 2.0, 1.0])

fails with

ERROR: LoadError: DimensionMismatch("matrix A has dimensions (3,3), matrix B has dimensions (1,3)")

Thanks. That could work … but it still computes derivatives I do not need. For example, for a 2\times 2 problem, I am looking to efficiently compute \frac{\partial^2 G_{11}(x)}{\partial x_1^2}, \frac{\partial^2 G_{12}(x)}{\partial x_1\partial x_2}, \frac{\partial^2 G_{21}(x)}{\partial x_2\partial x_1}, and \frac{\partial^2 G_{22}(x)}{\partial x_2^2} only, and not the other derivatives. If the derivatives are computed over large number of grid points, the difference in computational time could be significant.

This computes fewer derivatives

g(i, j) = x -> (x.^2 * transpose(x.^2))[i, j]
dg(i, j, k) = x -> ForwardDiff.gradient(g(i, j), x)[k]
d2g(i, j, k, l) = x -> ForwardDiff.gradient(dg(i, j, k), x)[l]

@btime d2g(1, 1, 1, 1)([3.0, 2.0, 1.0])

h(i, j) = x -> x[i]^2 * x[j]^2
dh(i, j, k) = x -> ForwardDiff.gradient(h(i, j), x)[k]
d2h(i, j, k, l) = x -> ForwardDiff.gradient(dh(i, j, k), x)[l]

@btime d2h(1, 1, 1, 1)([3.0, 2.0, 1.0]) 

but yields a small improvement only

  1.940 μs (18 allocations: 3.97 KiB)
  1.890 μs (15 allocations: 1.88 KiB)
1 Like

Hm, there might be a way do it somewhat faster in your special case rolling your own AD and using @elrod`s lazy_e:

using StaticArrays, Random, Test, BenchmarkTools

struct Dual{T<:Number, G<:Number} <: Number
    val::T
    grad::G
    Dual{T, G}(val) where {T<:Number, G<:Number} = begin
        new{T, G}(val, zero(T))
    end
    Dual{T, G}(val, grad) where {T<:Number, G<:Number} = begin
        new{T, G}(val, grad)
    end
end

Dual(val::T) where {T<:Number} = Dual{T, T}(val)
Dual(val::T, grad::G) where {T<:Number, G<:Number} = Dual{T, G}(val, grad)

Base.convert(::Type{T}, d::Dual{T, G}) where {T<:Number, G<:Number} = d.val

import Base.+
+(d1::Dual{T, G1}, d2::Dual{T, G2}) where {T<:Number, G1<:Number, G2<:Number} = Dual(d1.val + d2.val, d1.grad + d2.grad)
+(d::Dual{T, G}, t::T) where {T<:Number, G<:Number} = Dual(d.val + t, d.grad)
+(t::T, d::Dual{T, G}) where {T<:Number, G<:Number} = d + t

import Base.-
-(d::Dual{T, G}) where {T<:Number, G<:Number} = Dual(-d.val, -d.grad)
-(d1::Dual{T, G1}, d2::Dual{T, G2}) where {T<:Number, G1<:Number, G2<:Number} = d1 + (-d2)
-(d::Dual{T, G}, t::T) where {T<:Number, G<:Number} = d + (-t)
-(t::T, d::Dual{T, G}) where {T<:Number, G<:Number} = t+ (-d)

import Base.*
*(d1::Dual{T, G1}, d2::Dual{T, G2}) where {T<:Number, G1<:Number, G2<:Number} = Dual(d1.val * d2.val, d1.val * d2.grad + d2.val * d1.grad)
*(d::Dual{T, G}, t::T) where {T<:Number, G<:Number} = Dual(d.val * t, t * d.grad)
*(t::T, d::Dual{T, G}) where {T<:Number, G<:Number} = d * t

import Base.inv
inv(d::Dual{T, G}) where {T<:Number, G<:Number} = Dual(one(T) / d.val, - d.grad / (d.val * d.val))

import Base./
/(d1::Dual{T, G1}, d2::Dual{T, G2}) where {T<:Number, G1<:Number, G2<:Number} = d1 * inv(d2)
/(d::Dual{T, G}, t::T) where {T<:Number, G<:Number} = d * inv(t)
/(t::T, d::Dual{T, G}) where {T<:Number, G<:Number} = t * inv(d)

struct lazy_e{T} <: AbstractVector{T}
    i::Int
    n::Int
end
lazy_e(i::Int, n::Int, ::Type{T} = Int) where T = lazy_e{T}(i, n)
function Base.getindex(e::lazy_e{T}, i) where T
    @boundscheck @assert i <= e.n
    ifelse(i == e.i, one(T), zero(T))
end
Base.size(e::lazy_e) = (e.n,)

derivative(f, x::T) where T = f(Dual(x, one(T))).grad
partial(f, i, x::SVector{N, T}) where {N, T} = f(map((x, d) -> Dual(x, d), x, lazy_e{T}(i, N))).grad

p(i, j) = x -> x[i]^2 * x[j]^2
dp(i, j, k) = x -> partial(p(i, j), k, x)
d2p(i, j, k, l) = x -> partial(dp(i, j, k), l, x)

@btime dp(1, 1, 1)(SVector(3.0, 2.0, 1.0))
@btime d2p(1, 1, 1, 1)(SVector(3.0, 2.0, 1.0))

yielding

  1.200 ns (0 allocations: 0 bytes)
  1.300 ns (0 allocations: 0 bytes)

which looks a bit like compile time AD to me :wink:

1 Like

This is fantastic! Perhaps should be a Package? Thank you @goerch!