Efficient Vector^T * Jacobian matrix?

I’m trying to find a way to efficiently compute the product of a transposed vector with a Jacobian matrix. So given a vector function f and two large vectors v and x of same length, I want to compute v^T * jacobian(f)(x) without explicitly computing the jacobian matrix. Does anyone know how this can be done? Apparently it can be done in some way similar to the jacobian-vector product jacobian(f)(x)*v, but I’m unable to make the connection.

PS.: for the jacobian-vector product, the way to do it is shown in a paper called “Fast Exact Multiplication by the Hessian”, showing that jacobian(f)(x)v = d/dr f(x+rv)|r=0. Below my simple implementation of it, but as mentionned I need the other way around v^T * jacobian(f)(x).

using ForwardDiff

# Compute jacobian(f)(x)*v without building a jacobian
function jvp(f::Function, x::AbstractVector, v::AbstractVector)
    g = r->f(x+r*v)
    return ForwardDiff.derivative(g, 0.0)

# I actually want v'*jacobian(f)(x)
vjp(f::Function, x::AbstractVector, v::AbstractVector) = ??
1 Like

This is exactly what reverse-mode automatic differentiation is (e.g. if f is R^n to R you’re asking for a O(1) gradient). There are a few packages doing it, but it’s more tricky than forward diff.


I think you can use something like

function vjp(f::Function, x::AbstractVector, v::AbstractVector)
    g(y) = dot(v, f(y))
    ForwardDiff.gradient(g, x)

Though reverse mode autodiff would be much more efficient unless your vector x is short.

[Edit: Although it looks nice, I think this formulation amounts to computing the full Jacobian internally (possibly in batches when x is long… but nevertheless).]

1 Like

Yes it’s such a tricky problem that we have, in very rough order of age, and listed with comments / companion libraries:

Having listed all those, it’s hard to believe I’ve got them all…


Hello, thanks for the tips! So as you posted the trick to use is that jacobian(f, x)*v = d/dx (f(x)’*v) .The ForwardDiff approach already gives a speedup for large vectors for some reason, and using AutoGrad for backward diff, the real speedup comes into play. So if someone else needs this:

using AutoGrad

# Compute jacobian(f, x)*v
function vjp(f::Function, x::AbstractVector, v::AbstractVector)
    x = Param(x)
    temp = @diff dot(v, f(x))
    return grad(temp, x)

I also benchmarked the functions agains the “create full jacobian and multiply” version for vectors of length 10000. and the vjp and jvp functions above.

using BenchmarkTools, ForwardDiff

f(x) = sum(x) .* x.^2
truejac(x) = ForwardDiff.jacobian(f, x)

x_test = randn(10000)
v_test = randn(10000)

# Hopefully slower functions, building the full jacobian
slow_jvp(x, v) = truejac(x)*v
slow_vjp(x, v) = vec(v'*truejac(x))

@benchmark  slow_jvp($x_test, $v_test)
@benchmark  jvp($f, $x_test, $v_test)

@benchmark  slow_vjp($x_test, $v_test)
@benchmark  vjp($f, $x_test, $v_test)

The results on my laptop are a 20000 times faster execution of jvp and of 7000 for vjp compared to the naive approach :slight_smile:

1 Like

I think it might just be that the jacobian doesn’t need to be stored all at once because the columns are computed in batches with ForwardDiff and each column can be reduced with v as it’s generated. So improved cache efficiency, etc.