Derivative of eigenvalues and eigenvectors of Hermitian Matrix by automatic differentiation

question

#1

I am recently looking into automatic differentiation, as implemented by ForwardDiff.jl. The manual says functions that call blas are not supported, therefore any functions containing eig are not supported. I think I partly understand the reason behind this statement.

However, I noticed that eigh (eigen decomposition of Hermitian matrix) is supported by AlgoPy (https://github.com/b45ch1/algopy), a package providing automatic differentiation in python. This is quite unexpected since derivatives of eigenvectors are not well defined since eigenvectors can be multiplied by arbitrary number. I wonder how AlgoPy supports eigen decomposition and whether it’s doable in julia.


#2

Once you assume a normalization, you can define derivatives. The simplest way is to use implicit differentiation and add the normalization to the constraints. This can be tedious, but I find the Matrix Cookbook and Old and new matrix algebra useful for statistics helpful.

I have worked out a subcase of SVD, I am planning to make a PR when I make it work generally.


#3

Hermitian matrices have orthogonal eigenvectors, so the only sensible normalisation is that they each have norm 1, giving an orthogonal matrix Q that diagonalises the matrix. This is unique up to sign / phase.


#4

That’s in the case of numerically well-separated eigenvalues. For repeated/almost-repeated eigenvalues, the naive definition will fail (although perturbation of a repeated eigenvector makes sense if you only use the subspace and not the individual eigenvectors), and you have to project out the components related to the other eigenvectors or you will get NaN/noise. I don’t know if there’s any good, generic way to do this.


#5

It’s still well-defined if you run auto-differentiation on the QR algorithm itself. The only place of discontinuity is choice of when to deflate, but this discontinuity won’t introduce NaNs.


#6

Sure, but still the eigenvector perturbation goes like 1/delta, if delta is the separation between the eigenvalue of interest and the rest of the spectrum. This means that if you’re interested in the subspace and want to compute the derivative of O(1) quantities, eg <x1, Bx1> + <x2, Bx2> where x1 and x2 are the eigenvectors associated with the smallest eigenvalues of A (this is a typical use case in quantum mechanics) and do it naively on a matrix with |lambda2 - lambda1| = delta, and those two eigenvalues well-separated from the rest of the spectrum, you will be relying on cancellation of 1/delta terms, and the relative accuracy on your output will go as like eps/delta, where eps is the machine epsilon. The correct way to do it is to compute the perturbation of x1 orthogonal to x2, and reciprocally, but this of course assumes that you are not interested in separating x1 and x2, so it cannot be done in a completely generic way.


#7

Thank you. I am aware of the methods in the two links. However, I don’t think it has anything to do with auto-differentiation. Maybe auto-differentiation of QR mentioned by @dlfivefifty is what Algopy does. However, if QR is not implemented in julia, as the current situation for qrfact, is it still possible to do auto-differentiation easily?


#8

Maybe I was not clear: AFACT AD for eig is currently missing and needs to be implemented, hence the suggestions.


#9

I see. Thank you.


#10

It is implemented.

julia> using ForwardDiff

julia> A = ForwardDiff.Dual{Float64}.(rand(5, 2), ones(5,2))
5×2 Array{ForwardDiff.Dual{Float64,Float64,1},2}:
 Dual{Float64}(0.444375,1.0)  Dual{Float64}(0.870728,1.0)
 Dual{Float64}(0.322077,1.0)  Dual{Float64}(0.349103,1.0)
 Dual{Float64}(0.225048,1.0)  Dual{Float64}(0.287789,1.0)
 Dual{Float64}(0.746621,1.0)  Dual{Float64}(0.0858256,1.0)
 Dual{Float64}(0.523616,1.0)  Dual{Float64}(0.324125,1.0)

julia> qrfact(A)
Base.LinAlg.QR{ForwardDiff.Dual{Float64,Float64,1},Array{ForwardDiff.Dual{Float64,Float64,1},2}} with factors Q and R:
ForwardDiff.Dual{Float64,Float64,1}[Dual{Float64}(-0.408481,-0.138573) Dual{Float64}(-0.779144,0.319615) … Dual{Float64}(-0.235038,-0.187226) Dual{Float64}(-0.336001,-0.139441); Dual{Float64}(-0.296062,-0.353418) Dual{Float64}(-0.18001,0.0177585) … Dual{Float64}(0.89992,-0.111596) Dual{Float64}(0.235075,-0.0145282); … ; Dual{Float64}(-0.686313,0.392397) Dual{Float64}(0.569669,0.392032) … Dual{Float64}(-0.00134686,-0.094033) Dual{Float64}(-0.449298,-0.0753754); Dual{Float64}(-0.481321,0.0006333) Dual{Float64}(0.0394445,0.227912) … Dual{Float64}(-0.365894,-0.173119) Dual{Float64}(0.793217,-0.103401)]
ForwardDiff.Dual{Float64,Float64,1}[Dual{Float64}(-1.08787,-2.07905) Dual{Float64}(-0.733478,-2.43997); Dual{Float64}(0.0,0.0) Dual{Float64}(-0.733004,-0.174499)]

#11

Incredible. Thank you.


#12

QR for orthogonal matrix factorization != QR for eigenvalues: https://en.wikipedia.org/wiki/QR_algorithm


#13

Contains generic methods for eigenSelfAdjoint.


#14

Not sure if this is the same thing as you are suggesting, but if I wanted to know the eigenspace corresponding to an eigenvalue λ, I would call Q = nullspace(A - λ*I). I believe if A is Hermitian then one can first tridiagonalize and then use a single QR decomposition to determine the nullspace, this should be fine with auto-differentiation.

Here’s a quick implementation:

julia> λ = [1,1,2,3,4,5,6];  Q = qr(randn(length(λ))).Q; A = Q*diagm(0 => λ)*Q'; # Random matrix with redundant eigenvalues

julia> Q₁, T = hessenberg(A-I);

julia> Q₂ = qr(T).Q;

julia> q = Q₁*Q₂[:,end-1:end];

julia> norm(A*q - λ[1]*q)
8.740803670116966e-15

This algorithm will work fine with dual numbers. The apparent ill-posedness of the dimension of the nullspace to perturbation is resolved because it’s really an “epsilon” null-space, as we choose to treat small numbers in the QR decomposition as zero.


#15

Yes, but I think one would also need to obtain d\lambda.


#16

To be more clear:

B = randn(3,3)
pert = randn(3,3)
pert = pert+pert'
function E(y, δ)
    A = Symmetric(full(Diagonal([1.0, 1.0+δ, 2.0]))+y*pert)
    X = eig(A)[2]
    trace(B*X[:,1:2]*X[:,1:2]')
end

Differentiate that wrt y around 0, for representative values of delta = 0.0, 1e-10, 0.1. The method you suggest will give an inaccurate result for delta = 1e-10, because the perturbation to X will be of order 1e10, even though the function is smooth wrt y and the perturbation to E will be of order 1.


#17

Ah, in your example you need pivoting to get it to reveal the rank:

julia> δ = 0.000001; A = full(Diagonal([1.0, 1.0+δ, 2.0]))+y*pert

julia> Q₁, T = hessenberg(A-I);

julia> Q₂ = qr(T, Val(true)).Q;

julia> q = Q₁*Q₂[:,end-1:end];

julia> norm(A*q - q)
8.941524032091245e-5

Looking at T, the choice of pivots in rank-revealing QR should be robust to perturbation:

julia> T
3×3 Array{Float64,2}:
 1.08233e-5  3.53086e-5  1.69407e-21
 3.53086e-5  0.0651445   0.246914   
 0.0         0.246914    0.934715 

#18

This is not generally true. I think you’re thinking of the usual expression for first-order perturbation theory of the eigenvectors, which has a δ in the denominator. However, there is also a matrix element in the numerator, which typically also vanishes with δ if there is an eigenvalue crossing as a function of some parameter, in which case there is no divergence in the derivative.

(There is a √δ singularity in the derivative at points where two eigenvectors merge, i.e. where you have a defective matrix, a 2×2 Jordan block, sometimes called an “exceptional point”; see e.g. Ref. 39 of this paper. But this can’t happen in a Hermitian problem.)

In any case, I don’t find that textbook eigenvector perturbation-theory expression very useful in practice (for one thing, it requires a sum over all eigenvectors, which is impractical in large sparse/structured problems). Often, you are really trying to compute the derivative of one scalar function (or a few functions) of an eigenvector/eigenvalue, in which case there is a more efficient expression that can be derived by the adjoint method. In particular, if you can compute the derivatives of the matrix that you are computing eigenvalues/eigenvectors of (e.g. by ForwardDiff), then there is a relatively simple formula for derivatives of any function of the eigenvalues and/or eigenvectors, reviewed in section 4 of my course notes:

A complication is that, right at the point of an eigenvalue crossing, both eigenvalues and eigenvectors cease to be differentiable in the usual sense. One option is to use generalized gradients e.g. in eigenvalue optimization (essentially what is called “degenerate perturbation theory” in QM textbooks). Alternatively, in some cases you can formulate eigenvalue optimization problems as a sequence of SDPs.


#19

A useful collection of matrix AD results was published here, and the author has the pdf on his website here.


#20

Yes, and that’s the correct way to differentiate this kind of problem. ForwardDiff, however, does not know that and has to differentiate the full matrix of eigenvectors. I’m not claiming that the problem is intractable when you know what you want to do, I’m just saying that I don’t see how to do it in full generality using ForwardDiff.

Your argument about the numerator does not apply in the case of avoided crossings. If it’s more clear, use the same example as above but with the (essentially equivalent but simpler) matrix

[y   δ   0.0;
 δ   -y  0.0;
 0.0 0.0 1.0]

The first eigenvector switches from e2 when y << -delta to e1 when y >> delta, and its derivative is O(1/delta). Accordingly, if you attempt to differentiate naively (ie differentiate the matrix X of eigenvectors) as a function of y, you will rely on cancellation of O(1/delta) terms and lose log10(delta) digits in the result, even though the function itself is perfectly well-defined (indeed, it is constant in this simplified case).