Efficient Vector^T * Jacobian matrix?

Christian_Dengler · March 1, 2019, 5:42am

I’m trying to find a way to efficiently compute the product of a transposed vector with a Jacobian matrix. So given a vector function f and two large vectors v and x of same length, I want to compute v^T * jacobian(f)(x) without explicitly computing the jacobian matrix. Does anyone know how this can be done? Apparently it can be done in some way similar to the jacobian-vector product jacobian(f)(x)*v, but I’m unable to make the connection.

PS.: for the jacobian-vector product, the way to do it is shown in a paper called “Fast Exact Multiplication by the Hessian”, showing that jacobian(f)(x)v = d/dr f(x+rv)|r=0. Below my simple implementation of it, but as mentionned I need the other way around v^T * jacobian(f)(x).

using ForwardDiff

# Compute jacobian(f)(x)*v without building a jacobian
function jvp(f::Function, x::AbstractVector, v::AbstractVector)
    g = r->f(x+r*v)
    return ForwardDiff.derivative(g, 0.0)
end

# I actually want v'*jacobian(f)(x)
vjp(f::Function, x::AbstractVector, v::AbstractVector) = ??

antoine-levitt · March 1, 2019, 6:49am

This is exactly what reverse-mode automatic differentiation is (e.g. if f is R^n to R you’re asking for a O(1) gradient). There are a few packages doing it, but it’s more tricky than forward diff.

c42f · March 1, 2019, 6:50am

I think you can use something like

function vjp(f::Function, x::AbstractVector, v::AbstractVector)
    g(y) = dot(v, f(y))
    ForwardDiff.gradient(g, x)
end

Though reverse mode autodiff would be much more efficient unless your vector x is short.

[Edit: Although it looks nice, I think this formulation amounts to computing the full Jacobian internally (possibly in batches when x is long… but nevertheless).]

c42f · March 1, 2019, 7:25am

Yes it’s such a tricky problem that we have, in very rough order of age, and listed with comments / companion libraries:

ReverseDiffSource - via source transformation
ReverseDiff - via instruction tapes
ReverseDiffSparse / JuMP.Derivatives - JuMP
ReverseDiffTape - JuMP?
AutoDiffSource (?)
AutoGrad - Knet
Flux.Tracker - Flux
Nabla - ML on models containing linear algebra
Zygote - Flux, next gen
Yötä - ML model focused, backed by Cassette
…
Capstan? - Long awaited but stalled on compiler tech. Motivated awesome stuff like Cassette in the meantime.

Having listed all those, it’s hard to believe I’ve got them all…

Christian_Dengler · March 4, 2019, 5:41am

Hello, thanks for the tips! So as you posted the trick to use is that jacobian(f, x)*v = d/dx (f(x)'*v) .The ForwardDiff approach already gives a speedup for large vectors for some reason, and using AutoGrad for backward diff, the real speedup comes into play. So if someone else needs this:

using AutoGrad

# Compute jacobian(f, x)*v
function vjp(f::Function, x::AbstractVector, v::AbstractVector)
    x = Param(x)
    temp = @diff dot(v, f(x))
    return grad(temp, x)
end

I also benchmarked the functions agains the “create full jacobian and multiply” version for vectors of length 10000. and the vjp and jvp functions above.

using BenchmarkTools, ForwardDiff

f(x) = sum(x) .* x.^2
truejac(x) = ForwardDiff.jacobian(f, x)

x_test = randn(10000)
v_test = randn(10000)

# Hopefully slower functions, building the full jacobian
slow_jvp(x, v) = truejac(x)*v
slow_vjp(x, v) = vec(v'*truejac(x))

@benchmark  slow_jvp($x_test, $v_test)
@benchmark  jvp($f, $x_test, $v_test)

@benchmark  slow_vjp($x_test, $v_test)
@benchmark  vjp($f, $x_test, $v_test)

The results on my laptop are a 20000 times faster execution of jvp and of 7000 for vjp compared to the naive approach

c42f · March 4, 2019, 9:51am

I think it might just be that the jacobian doesn’t need to be stored all at once because the columns are computed in batches with ForwardDiff and each column can be reduced with v as it’s generated. So improved cache efficiency, etc.

richardr2926 · September 11, 2024, 4:01am

Hello.

This is what pullback in Zygote is meant to do. So:

function f(x)
    output = [
        x[1]^6 * x[2]^4 * x[3]^9 * x[4]^2;
        x[1]^2 * x[2]^3 * x[3]^5 * x[4]^3;
        x[1]^5 * x[2]^7 * x[3]^7 * x[4]^6;
    ]
    return output
end

left_multiplication_point = [0.5; 0.8; 1.0]
evaluation_point = [0.2; 0.1; 1.0; 4.0]

function clever_vjp(func, primal, cotangent)
    _, func_pullback = pullback(func, primal)
    vjp_result, = func_pullback(cotangent)
    return vjp_result
end

clever_vjp(f, evaluation_point, left_multiplication_point)

gdalle · September 11, 2024, 5:41pm

Thanks for your answer, but in general it’s best not to ressuscitate threads from 6 years ago, because everyone gets pinged in the process. If you have specific questions you can create a new topic for that
Also do check out DifferentiationInterface.jl with its pullback function. Nowadays that’s probably the most generic solution.

Topic		Replies	Views
Improve speed of vector-jacobian product Performance reversediff , autodiff	2	504	November 16, 2023
ForwardDiff.jl: Compute J_f(x) * f(x) General Usage question	2	236	June 1, 2023
Matrix Free transpose jacobian using AD General Usage forward	7	699	December 20, 2019
How to think about the cost of vJ and Jv (pullback and pushforward) Numerics question , ad	6	92	May 15, 2025
Efficient pushforward (Jacobian vector product) with multiple vectors using automatic differentiation Performance autodiff	13	545	May 27, 2023

Efficient Vector^T * Jacobian matrix?

Related topics