I’m trying to find a way to efficiently compute the product of a transposed vector with a Jacobian matrix. So given a vector function f and two large vectors v and x of same length, I want to compute v^T * jacobian(f)(x) without explicitly computing the jacobian matrix. Does anyone know how this can be done? Apparently it can be done in some way similar to the jacobian-vector product jacobian(f)(x)*v, but I’m unable to make the connection.

PS.: for the jacobian-vector product, the way to do it is shown in a paper called “Fast Exact Multiplication by the Hessian”, showing that jacobian(f)(x)*v = d/dr f(x+r*v)|r=0. Below my simple implementation of it, but as mentionned I need the other way around v^T * jacobian(f)(x).

```
using ForwardDiff
# Compute jacobian(f)(x)*v without building a jacobian
function jvp(f::Function, x::AbstractVector, v::AbstractVector)
g = r->f(x+r*v)
return ForwardDiff.derivative(g, 0.0)
end
# I actually want v'*jacobian(f)(x)
vjp(f::Function, x::AbstractVector, v::AbstractVector) = ??
```

1 Like

This is exactly what reverse-mode automatic differentiation is (e.g. if f is R^n to R you’re asking for a O(1) gradient). There are a few packages doing it, but it’s more tricky than forward diff.

2 Likes

I think you can use something like

```
function vjp(f::Function, x::AbstractVector, v::AbstractVector)
g(y) = dot(v, f(y))
ForwardDiff.gradient(g, x)
end
```

Though reverse mode autodiff would be much more efficient unless your vector `x`

is short.

[Edit: Although it looks nice, I think this formulation amounts to computing the full Jacobian internally (possibly in batches when `x`

is long… but nevertheless).]

1 Like

Yes it’s such a tricky problem that we have, in very rough order of age, and listed with comments / companion libraries:

Having listed all those, it’s hard to believe I’ve got them all…

2 Likes

Hello, thanks for the tips! So as you posted the trick to use is that jacobian(f, x)*v = d/dx (f(x)’*v) .The ForwardDiff approach already gives a speedup for large vectors for some reason, and using AutoGrad for backward diff, the real speedup comes into play. So if someone else needs this:

```
using AutoGrad
# Compute jacobian(f, x)*v
function vjp(f::Function, x::AbstractVector, v::AbstractVector)
x = Param(x)
temp = @diff dot(v, f(x))
return grad(temp, x)
end
```

I also benchmarked the functions agains the “create full jacobian and multiply” version for vectors of length 10000. and the vjp and jvp functions above.

```
using BenchmarkTools, ForwardDiff
f(x) = sum(x) .* x.^2
truejac(x) = ForwardDiff.jacobian(f, x)
x_test = randn(10000)
v_test = randn(10000)
# Hopefully slower functions, building the full jacobian
slow_jvp(x, v) = truejac(x)*v
slow_vjp(x, v) = vec(v'*truejac(x))
@benchmark slow_jvp($x_test, $v_test)
@benchmark jvp($f, $x_test, $v_test)
@benchmark slow_vjp($x_test, $v_test)
@benchmark vjp($f, $x_test, $v_test)
```

The results on my laptop are a 20000 times faster execution of jvp and of 7000 for vjp compared to the naive approach

1 Like

I think it might just be that the jacobian doesn’t need to be stored all at once because the columns are computed in batches with `ForwardDiff`

and each column can be reduced with `v`

as it’s generated. So improved cache efficiency, etc.