How do you speed up the linear sparse solver in Zygote?

mohamed82008 · March 22, 2024, 4:30pm

Thinking some more about the example I posted above. In practice, one wouldn’t write inv in the code at all. And one can define a rule for the linear solve directly so u and I should be a single function call. This means that the following line can be optimized in the rrule of the linear solve (reusing the factorization, reusing the linear solve solution and doing the lazy 1 rank matrix representation).

dK=−(K^{−1} \cdot du) \cdot (K^{-1} \cdot f)^T

In fact, maybe all that’s needed is to make this line LinearSolve.jl/src/adjoint.jl at main · SciML/LinearSolve.jl · GitHub a lazy multiplication returning a 1-rank matrix instead of a dense one. @ChrisRackauckas would you be open to a PR for this?

Then one can define a rule for dxi=tr(dK^T \cdot K_i) or dot(A, B) in Julia where A is a 1-rank matrix and B is a SparseMatrixCSC. Then we can get near optimal performance in the above case using Zygote. Not sure if Enzyme rules support lazy arrays for the adjoint. This may be easier than I thought initially without too many changes.

ChrisRackauckas · March 22, 2024, 9:49pm

I’d be open to a PR for this. I don’t know what would happen if it’s not matching the type of A but it’s worth investigating.

mohamed82008 · March 23, 2024, 10:25am

github.com/SciML/LinearSolve.jl

Make the rrule's outer product lazy

SciML:main ← mohamed82008:mt/lazy_rrule

opened 10:22AM - 23 Mar 24 UTC

mohamed82008

+11 -2

## Checklist - [x] Appropriate tests were added - [ ] Any code changes were …done in a way that does not break public API - [ ] All documentation related to code changes were updated - [x] The new code follows the [contributor guidelines](https://github.com/SciML/.github/blob/master/CONTRIBUTING.md), in particular the [SciML Style Guide](https://github.com/SciML/SciMLStyle) and [COLPRAC](https://github.com/SciML/COLPRAC). - [ ] Any new documentation only uses public API ## Additional context This PR implements the suggestion in https://discourse.julialang.org/t/how-do-you-speed-up-the-linear-sparse-solver-in-zygote/111801/41?u=mohamed82008. To be safe, I also bumped the major version because the output type of `Zygote.gradient` wrt the matrix is changed in this PR. No new documentation was added as this PR is just a performance improvement.

ToPo · March 23, 2024, 3:33pm

Thank you! I will refer to it.

mohamed82008 · March 24, 2024, 7:18am

The PR above just got merged which means that in the next version of LinearSolve.jl if you use Zygote.jl (or any ChainRules.jl based AD package) to get the gradient of dot(a, A \ b) wrt the matrix A, you will get a lazy outer product. This is true when A is dense.

using LinearAlgebra, LinearSolve, Zygote

function invquad(a, A, b)
    prob = LinearProblem(A, b)
    sol = solve(
        prob,
        LinearSolve.DefaultLinearSolver(LinearSolve.DefaultAlgorithmChoice.RFLUFactorization),
    )
    return dot(a, sol.u)
end

n = 100; A = rand(n, n); b1 = rand(n); b2 = rand(n);

db1, dA, db2 = Zygote.gradient(invquad, b1, A, b2);

Base.summarysize(dA)
# 1752

Base.summarysize(A)
# 80040

If A is sparse, Zygote.pullback gives you the correct lazy outer product but Zygote.gradient projects that to be a sparse matrix with the same sparsity structure as the input which is arguably incorrect (Gradient wrt to a sparse matrix is mathematically wrong · Issue #1507 · FluxML/Zygote.jl · GitHub) because the gradient is mathematically defined for the structural zeros. The fact that they are structural zeros is an implementation detail. The projection imo should be done by the user after the gradient call if needed. Anyways, if you use pullback directly, you are safe.

function sinvquad(a, A, b)
    prob = LinearProblem(A, b)
    sol = solve(prob)
    return dot(a, sol.u)
end

sA = sparse(A)
dsA = Zygote.pullback(sinvquad, b1, sA, b2)[2](1.0)[2]

Base.summarysize(dsA)
# 1752

Base.summarysize(sA)
# 160968

I think the above PR was the first step towards preserving structure in rules and using lazy representations where possible. The next step is Make the rrule for 3-arg dot lazy · Issue #788 · JuliaDiff/ChainRules.jl · GitHub.

Topic		Replies	Views
Zygote Performance (Again...) General Usage zygote , forwarddiff , tullio	17	1801	June 11, 2021
Zygote dozens* of times slower than manually written function Performance zygote , forwarddiff	17	1760	April 21, 2022
Automatic differentiation, adjoint methods, and linear systems of equations General Usage	4	1141	May 8, 2019
Speeding up Zygote autodiff for numerical loop Performance question	13	292	December 16, 2024
Zygote Performance Machine Learning question	22	4977	September 23, 2019

How do you speed up the linear sparse solver in Zygote?

Related topics