# Efficient Vector^T * Jacobian matrix?

I’m trying to find a way to efficiently compute the product of a transposed vector with a Jacobian matrix. So given a vector function f and two large vectors v and x of same length, I want to compute v^T * jacobian(f)(x) without explicitly computing the jacobian matrix. Does anyone know how this can be done? Apparently it can be done in some way similar to the jacobian-vector product jacobian(f)(x)*v, but I’m unable to make the connection.

PS.: for the jacobian-vector product, the way to do it is shown in a paper called “Fast Exact Multiplication by the Hessian”, showing that jacobian(f)(x)v = d/dr f(x+rv)|r=0. Below my simple implementation of it, but as mentionned I need the other way around v^T * jacobian(f)(x).

``````using ForwardDiff

# Compute jacobian(f)(x)*v without building a jacobian
function jvp(f::Function, x::AbstractVector, v::AbstractVector)
g = r->f(x+r*v)
return ForwardDiff.derivative(g, 0.0)
end

# I actually want v'*jacobian(f)(x)
vjp(f::Function, x::AbstractVector, v::AbstractVector) = ??
``````
1 Like

This is exactly what reverse-mode automatic differentiation is (e.g. if f is R^n to R you’re asking for a O(1) gradient). There are a few packages doing it, but it’s more tricky than forward diff.

2 Likes

I think you can use something like

``````function vjp(f::Function, x::AbstractVector, v::AbstractVector)
g(y) = dot(v, f(y))
end
``````

Though reverse mode autodiff would be much more efficient unless your vector `x` is short.

[Edit: Although it looks nice, I think this formulation amounts to computing the full Jacobian internally (possibly in batches when `x` is long… but nevertheless).]

1 Like

Yes it’s such a tricky problem that we have, in very rough order of age, and listed with comments / companion libraries:

Having listed all those, it’s hard to believe I’ve got them all…

2 Likes

Hello, thanks for the tips! So as you posted the trick to use is that jacobian(f, x)*v = d/dx (f(x)’*v) .The ForwardDiff approach already gives a speedup for large vectors for some reason, and using AutoGrad for backward diff, the real speedup comes into play. So if someone else needs this:

``````using AutoGrad

# Compute jacobian(f, x)*v
function vjp(f::Function, x::AbstractVector, v::AbstractVector)
x = Param(x)
temp = @diff dot(v, f(x))
end
``````

I also benchmarked the functions agains the “create full jacobian and multiply” version for vectors of length 10000. and the vjp and jvp functions above.

``````using BenchmarkTools, ForwardDiff

f(x) = sum(x) .* x.^2
truejac(x) = ForwardDiff.jacobian(f, x)

x_test = randn(10000)
v_test = randn(10000)

# Hopefully slower functions, building the full jacobian
slow_jvp(x, v) = truejac(x)*v
slow_vjp(x, v) = vec(v'*truejac(x))

@benchmark  slow_jvp(\$x_test, \$v_test)
@benchmark  jvp(\$f, \$x_test, \$v_test)

@benchmark  slow_vjp(\$x_test, \$v_test)
@benchmark  vjp(\$f, \$x_test, \$v_test)
``````

The results on my laptop are a 20000 times faster execution of jvp and of 7000 for vjp compared to the naive approach

1 Like

I think it might just be that the jacobian doesn’t need to be stored all at once because the columns are computed in batches with `ForwardDiff` and each column can be reduced with `v` as it’s generated. So improved cache efficiency, etc.