Computing matrix derivatives for eigenvaule sensitivity analysis

jwalker · April 11, 2024, 8:09pm

I have a problem where I would like to apply eigenvalue sensitivity analysis, using Enzyme to compute the derivatives of the matrix A(\mathbf{p}), where \mathbf{p} = (p_1,\ldots,p_n). Per @stevengj’s notes, if I want the derivative of an eigenvalue, then for a system

Ax = \lambda x\\ A^Ty = \lambda y,

I want to compute

\frac{\partial\lambda}{\partial\mathbf{p}} = \frac{y^T \frac{\partial A}{\partial \mathbf{p}} x}{y^T x}.

What is the recommended way to compute this?

My thought is that since I don’t actually care about the derivatives of the matrix, only their product with the eigenvectors, that I should avoid explicitly constructing that object. I would also like to hand Enzyme a structure that contains all of the parameters and get all of the partials \partial \lambda/\partial p_1, \partial \lambda/\partial p_2, \ldots at once.

So, what if I construct some function

f(x,y, A(\mathbf{p})) = \frac{y^T A x}{y^T x}

s.t. if I fix x,y, then

\frac{\partial \lambda}{\partial \mathbf{p}} = \frac{\partial f}{\partial \mathbf{p}}.

Is this a good way to go about this?

stevengj · April 11, 2024, 8:15pm

Seems reasonable to me.

(Edited: I had read the first half of your post and started to write out exactly what you suggested.)

Note that ChainRules.jl includes rules for eigenvalue differentiation, if I recall correctly. But they are based on a dense-matrix eigen routine, so they aren’t appropriate if you are computing just one eigenvalue (and corresponding eigenvalues), e.g. by some Krylov method.

jwalker · April 11, 2024, 8:29pm

Thanks for confirming and for pointing to ChainRules.jl.

The application will be using sparse matrices and Krylov methods—and it’s actually a generalized EVP. Eventually, the target metric will be a function of the eigenvectors. So, I’ll have to solve the adjoint problem. But, I think that the same sort of trick as above should work for the AD portion.

mohamed82008 · April 11, 2024, 8:33pm

You may also be interested in Advanced use cases · ImplicitDifferentiation.jl if you formulate your eigenvalue problem as a constrained optimization problem.

gdalle · April 12, 2024, 4:46am

In this case the optimality conditions are fairly simple, probably similar to what @mohamed82008 coded in DifferentiableFactorizations.jl:

using Pkg
Pkg.add(url="https://github.com/gdalle/ImplicitDifferentiation.jl")
Pkg.add(["KrylovKit", "Zygote"])

using ImplicitDifferentiation
using LinearAlgebra: dot, eigvals, norm
using KrylovKit: eigsolve
import Zygote

MyMatrix(p) = p * transpose(p)

struct MyLazyMatrix{P}
    p::P
end

function (A::MyLazyMatrix)(x::AbstractVector)
    return A.p .* dot(A.p, x)
end

function forward(p)
    vals, vecs, _ = eigsolve(MyLazyMatrix(p), copy(p), 1, :LR)
    λ = real(only(vals))
    v = real.(only(vecs))
    return vcat(λ, v / norm(v))
end

function conditions(p, λv)
    λ = first(λv)
    v = λv[2:end]
    v_scale = norm(v) - 1
    v_error = MyLazyMatrix(p)(v) .- λ .* v
    return vcat(v_scale, v_error)
end

implicit_max_eigval = ImplicitFunction(forward, conditions)

f1(p) = maximum(real, eigvals(MyMatrix(p)))
f2(p) = real(only(first(eigsolve(MyLazyMatrix(p), copy(p), 1, :LR))))
f3(p) = implicit_max_eigval(p)[1]

The result is that ImplicitDifferentiation.jl succeeds where Zygote.jl alone fails:

julia> p = randn(3)
3-element Vector{Float64}:
 -1.5621152096320876
  3.0824487193542387
 -0.8345846981100772

julia> f1(p), f2(p), f3(p)
(12.63822565393197, 12.638225653931976, 12.638225653931976)

julia> Zygote.gradient(f1, p)[1]
3-element Vector{Float64}:
 -3.1242304192641748
  6.164897438708478
 -1.6691693962201561

julia> Zygote.gradient(f2, p)[1]
ERROR: Compiling Tuple{typeof(eigsolve), MyLazyMatrix{Vector{Float64}}, Vector{Float64}, Int64, Symbol, KrylovKit.Arnoldi{KrylovKit.ModifiedGramSchmidt2, Float64}}: try/catch is not supported.
Refer to the Zygote documentation for fixes.
https://fluxml.ai/Zygote.jl/latest/limitations

julia> Zygote.gradient(f3, p)[1]
3-element Vector{Float64}:
 -3.1242304192641748
  6.1648974387084765
 -1.6691693962201541

stevengj · April 12, 2024, 3:25pm

Indeed, implicit differentiation via these two equations is exactly how adjoint eigenproblem differentiation can be derived (though in my notes I used x^* x - 1 = 0 rather than \Vert x \Vert - 1 = 0).

(But for very large problems you still might want to do it manually to have more control over how the adjoint linear systems are solved. In any case, even if you are using AD it is nice to have some idea of how things work under the hood so that you can use AD effectively, and to know where it might run into trouble.)

gdalle · April 12, 2024, 3:29pm

Indeed, and ImplicitDifferentiation allows you to plug in an arbitrary linear solver, either dense or iterative (lazy). It’s currently a bit cumbersome but I plan to make it much simpler in the next release. You can also pick a different AD backend to differentiate through the conditions than the one doing the outer differentiation, which may come in handy

Topic		Replies	Views
Native eigenvals for differentiable programming Machine Learning	23	4869	February 13, 2023
Eigenvalue sensitivites using autodiff Numerics eigenvalues , autodiff	5	829	May 2, 2022
Automatic differentiation of f(x) constructed by decomposition of a matrix M(x) Optimization (Mathematical) differentiation , flux , machine-learning , numerics	7	2867	January 8, 2020
Derivative of eigenvalues and eigenvectors of Hermitian Matrix by automatic differentiation General Usage question	19	4510	June 11, 2018
Odd behaviour regarding Adjoint Sensitivity Analysis Numerics question , sciml , differentialequation	8	112	October 11, 2024

Computing matrix derivatives for eigenvaule sensitivity analysis

Related topics