Computing matrix derivatives for eigenvaule sensitivity analysis

I have a problem where I would like to apply eigenvalue sensitivity analysis, using Enzyme to compute the derivatives of the matrix A(\mathbf{p}), where \mathbf{p} = (p_1,\ldots,p_n). Per @stevengj’s notes, if I want the derivative of an eigenvalue, then for a system

Ax = \lambda x\\ A^Ty = \lambda y,

I want to compute

\frac{\partial\lambda}{\partial\mathbf{p}} = \frac{y^T \frac{\partial A}{\partial \mathbf{p}} x}{y^T x}.

What is the recommended way to compute this?

My thought is that since I don’t actually care about the derivatives of the matrix, only their product with the eigenvectors, that I should avoid explicitly constructing that object. I would also like to hand Enzyme a structure that contains all of the parameters and get all of the partials \partial \lambda/\partial p_1, \partial \lambda/\partial p_2, \ldots at once.

So, what if I construct some function

f(x,y, A(\mathbf{p})) = \frac{y^T A x}{y^T x}

s.t. if I fix x,y, then

\frac{\partial \lambda}{\partial \mathbf{p}} = \frac{\partial f}{\partial \mathbf{p}}.

Is this a good way to go about this?

Seems reasonable to me.

(Edited: I had read the first half of your post and started to write out exactly what you suggested.)

Note that ChainRules.jl includes rules for eigenvalue differentiation, if I recall correctly. But they are based on a dense-matrix eigen routine, so they aren’t appropriate if you are computing just one eigenvalue (and corresponding eigenvalues), e.g. by some Krylov method.

1 Like

Thanks for confirming and for pointing to ChainRules.jl.

The application will be using sparse matrices and Krylov methods—and it’s actually a generalized EVP. Eventually, the target metric will be a function of the eigenvectors. So, I’ll have to solve the adjoint problem. But, I think that the same sort of trick as above should work for the AD portion.

You may also be interested in Advanced use cases · ImplicitDifferentiation.jl if you formulate your eigenvalue problem as a constrained optimization problem.

1 Like

In this case the optimality conditions are fairly simple, probably similar to what @mohamed82008 coded in DifferentiableFactorizations.jl:

using Pkg
Pkg.add(url="https://github.com/gdalle/ImplicitDifferentiation.jl")
Pkg.add(["KrylovKit", "Zygote"])

using ImplicitDifferentiation
using LinearAlgebra: dot, eigvals, norm
using KrylovKit: eigsolve
import Zygote

MyMatrix(p) = p * transpose(p)

struct MyLazyMatrix{P}
    p::P
end

function (A::MyLazyMatrix)(x::AbstractVector)
    return A.p .* dot(A.p, x)
end

function forward(p)
    vals, vecs, _ = eigsolve(MyLazyMatrix(p), copy(p), 1, :LR)
    λ = real(only(vals))
    v = real.(only(vecs))
    return vcat(λ, v / norm(v))
end

function conditions(p, λv)
    λ = first(λv)
    v = λv[2:end]
    v_scale = norm(v) - 1
    v_error = MyLazyMatrix(p)(v) .- λ .* v
    return vcat(v_scale, v_error)
end

implicit_max_eigval = ImplicitFunction(forward, conditions)

f1(p) = maximum(real, eigvals(MyMatrix(p)))
f2(p) = real(only(first(eigsolve(MyLazyMatrix(p), copy(p), 1, :LR))))
f3(p) = implicit_max_eigval(p)[1]

The result is that ImplicitDifferentiation.jl succeeds where Zygote.jl alone fails:

julia> p = randn(3)
3-element Vector{Float64}:
 -1.5621152096320876
  3.0824487193542387
 -0.8345846981100772

julia> f1(p), f2(p), f3(p)
(12.63822565393197, 12.638225653931976, 12.638225653931976)

julia> Zygote.gradient(f1, p)[1]
3-element Vector{Float64}:
 -3.1242304192641748
  6.164897438708478
 -1.6691693962201561

julia> Zygote.gradient(f2, p)[1]
ERROR: Compiling Tuple{typeof(eigsolve), MyLazyMatrix{Vector{Float64}}, Vector{Float64}, Int64, Symbol, KrylovKit.Arnoldi{KrylovKit.ModifiedGramSchmidt2, Float64}}: try/catch is not supported.
Refer to the Zygote documentation for fixes.
https://fluxml.ai/Zygote.jl/latest/limitations

julia> Zygote.gradient(f3, p)[1]
3-element Vector{Float64}:
 -3.1242304192641748
  6.1648974387084765
 -1.6691693962201541
1 Like

Indeed, implicit differentiation via these two equations is exactly how adjoint eigenproblem differentiation can be derived (though in my notes I used x^* x - 1 = 0 rather than \Vert x \Vert - 1 = 0).

(But for very large problems you still might want to do it manually to have more control over how the adjoint linear systems are solved. In any case, even if you are using AD it is nice to have some idea of how things work under the hood so that you can use AD effectively, and to know where it might run into trouble.)

1 Like

Indeed, and ImplicitDifferentiation allows you to plug in an arbitrary linear solver, either dense or iterative (lazy). It’s currently a bit cumbersome but I plan to make it much simpler in the next release. You can also pick a different AD backend to differentiate through the conditions than the one doing the outer differentiation, which may come in handy