Native eigenvals for differentiable programming

GJBoth · August 3, 2019, 9:06am

Hi all,

I’m working on a problem where I require an autodifferentiable eigenvalue solver for real, square matrices (but non-hermitian). Since Linear algebra’s version is based on LAPACK, I can’t use autodiff and I haven’t been able to make any other versions work (e.g. using GenericLinearAlgebra.jl). Do you know of a solver I can use? Pytorch seems to have an implementation (torch — PyTorch 1.12 documentation) so it should be possible I think…

Thanks!

baggepinnen · August 3, 2019, 9:10am

Can’t you just tell the AD package what the adjoint to eigvals is? Then you don’t have to differentiate through it.

GJBoth · August 3, 2019, 9:17am

I’m pretty new to auto diff; how exactly would this work?

baggepinnen · August 3, 2019, 9:33am

You can see how it’s done for some common deep learning functions here
https://github.com/FluxML/Zygote.jl/blob/a50f4cf88b1a9e74a94e5295cb468b94ae3f1d9e/src/lib/nnlib.jl#L4
A guide is provided here
https://fluxml.ai/Zygote.jl/dev/adjoints/

carstenbauer · August 3, 2019, 10:01am

Perhaps ArnoldiMethods.jl Iterative Solvers.jl and KrylovKit.jl might work for you. (all implemented in Julia.

crinders · August 3, 2019, 11:51am

Autograd used to have it before the upgrade to 1.0 made descending into tuples for backward mode ad difficult. I’ve implemented some hacks for svd myself, I’d have to look them up (essentially you’d have to unwrap the SVD return type). Anyway the backward mode code for svd is still in the autograd package, is just not accessible. I believe there is an open issue.

GJBoth · August 3, 2019, 12:05pm

I tried ArnoldiMethods but it seemed there was some sampling in there (not sure, I’m not versed in these numerical methods), making it impossible to do autograd… I’ll check the other things but I think making a custom adjoint might be easiest.

GJBoth · August 3, 2019, 12:06pm

Would be great if you could look them up, then I’ll try and start implementing something for eig. Any idea how hard it would be?

stevengj · August 3, 2019, 12:24pm

You could always implement the derivative yourself. The adjoint formula for differentiating eigenvalues with respect to changes in the matrix is pretty simple; see https://github.com/mitmath/18335/blob/spring21/notes/adjoint/adjoint.pdf (it’s equivalent to the Hellmann–Feynman theorem); differentiating scalar functions of the eigenvectors is a bit more complicated but not too bad. (My notes show the case of Hermitian matrices; the extension to non-Hermitian problems is similar but involves left and right eigenvectors — IIRC, every xᵀ is replaced with the left eigenvector.)

One snag is that eigenvalues cease to be differentiable when two eigenvalues cross. (And eigenvector methods of non-Hermitian problems can become ill-conditioned near crossings.) Crossings may seem unlikely, but eigenvalue-optimization problems often push eigenvalues towards crossings (especially in Hermitian problems). There is a way around it using generalized gradients, as I review in my notes here: https://github.com/mitmath/18335/blob/spring21/notes/adjoint/eigenvalue-adjoint.pdf… but usually in my own work I’ve found that’s it’s possible to reformulate ostensibly eigenvalue-optimization problems in ways that avoid explicit eigenvalue computations. For example, in terms of SDPs or in terms of resolvent operators.

You really don’t want to apply automatic differentiation blindly to an iterative solver. Differentiating a recurrence relationship like this is much more expensive than differentiating the final result.

For the same reason, a Julia-native eigensolver wouldn’t help here. Applying AD to an eigenvalue algorithm like QR is a non-starter. (All eigenvalue algorithms for matrices bigger than 4×4 are iterative, thanks to the Abel–Ruffini theorem.)

Automatic differentiation is a great tool, but at some point you should also learn how to compute derivatives yourself, if only to understand how to use these tools effectively. Moreover, once you get to a complicated enough problem that involves external libraries, AD tools quickly hit a wall — I’ve never been able to use AD for a non-trivial problem in PDE-constrained optimization.

ericphanson · August 3, 2019, 3:16pm

In case anyone is interested, I tried doing exactly this with all the iterative solvers @carstenbauer mentioned, on Julia 1.1, 1.2rc and 1.3-alpha with Zygote on its release version and master branch, along with GenericLinearAlgebra. The result is here: GitHub - ericphanson/Attempt-diff-eigmin: Attempts to differentiate `eigmin` in Julia (I focussed on finding the minimal eigenvalue of a Hermitian matrix). None of it worked, and it sounds like it’s not a good idea anyway.

I also included an implementation of the analytic solution for the non-degenerate complex Hermitian case (which is just v*v' for v the associated eigenvector). Zygote didn’t seem to like the complex numbers, so I packed the matrices as real vectors (which is what I get out of the neural net anyway). That seems to work fine. (edit: I realized my real vector to complex Hermitian matrix code is wrong! I thought I had tested that, but I guess not. edit2: fixed)

I also tried using PyTorch via PyCall using some code from Mike Innes’ JuliaCon talk, but I couldn’t get that to work either.

antoine-levitt · August 4, 2019, 1:35pm

I got a very simple iterative eigensolver to AD with Zygote sometime ago, so it can be done. But indeed it doesn’t scale; much better is to write a custom adjoint that solves the perturbative equation iteratively.

GJBoth · August 5, 2019, 10:28am

Thanks for the detailed response. I’m optimizing over a bandstructure so my eigenvalues can/will be degenerate, so I’ll have to use generalized gradients. I’ll start implementing soon!

ChrisRackauckas · August 5, 2019, 1:29pm

Indeed, a big part of the design of ChainRules that I am happy about is that the rules are separated out from the main package and other packages supposed to add rules to their functions. While people tend emphasize the “automatic” differentiation part of differentiable programming, I think that where it is really helpful is when you put as many rules as you can, and only let it use a generic fallback derivative on the rest. If we add a bunch of rules in the intermediate steps throughout DiffEq, QuadGK, FFTW, NLsolve, Optim, etc., and then effectively chain them together, I think we will finally get something Steven could use for a non-trivial problem . But that won’t happen without buy-in from the ecosystem.

antoine-levitt · August 5, 2019, 1:45pm

Emphatically agree. The biggest problem with manual gradients is how they lead to complex code. However, the bottlenecks are usually pretty clear, and rely on primitives (solvers), for which it is often easy to code adjoints manually. If the AD system is able to cope with the rest of the code, and with a few well placed adjoint declarations for the solvers, it becomes possible to get performant AD for non trivial scientific computing applications. My understanding is that this still needs major work on the part of Zygote and friends to be practical, though…

stevengj · August 5, 2019, 1:46pm

Check out the methods in our paper and the references therein: [1405.4350] Robust topology optimization of three-dimensional photonic-crystal band-gap structures

baggepinnen · September 12, 2019, 4:44am

This document might be of interest to anyone reading this thread
“An extended collection of matrix derivative results for forward and reverse mode algorithmic differentiation”, Mike Giles

baggepinnen · September 12, 2019, 5:19am

Here’s a reverse mode eigen

"return `eigen` as well as a function `back(Δ)`"
function gradeigen(A)
    e,V = eigen(A)
    n = size(A,1)
    (e,V), function (Δ)
        Δe, ΔV = Δ
        F = [i==j ? 0 : inv(e[j] - e[i]) for i=1:n, j=1:n]
        inv(V)'*(Diagonal(Δe) + F .* (V'ΔV))*V'
    end
end

# Test
A = randn(3,3)
Δ = (complex.(randn(3)),0complex.(ones(3,3)))
Δe = Δ[1]

using FiniteDifferences, LinearAlgebra
fd = central_fdm(15,1)
eV, back = gradeigen(A)
back(Δ) - j′vp(fd, eigvals, Δe, complex.(A))
# either me or j′vp does not seem to get imaginary parts correct

Per · September 12, 2019, 6:19am

I suspect that you should use transpose - not ' (i.e. adjoint) in the expression for the derivative. (I didn’t check your code explicitly, but this is my general experience with derivatives of linear algebra expressions.)

baggepinnen · September 12, 2019, 6:23am

Thanks, the derivative for the eigvals seems correct, at least according to Zygote’s gradtest. If I change to transpose, gradtest fails. Something does seem to be up with the sensitivity to eigen.vectors though.

gszep · May 13, 2020, 9:31pm

is this code based the document you posted by Mike Giles?

Topic		Replies	Views
Eigenvalue sensitivites using autodiff Numerics eigenvalues , autodiff	5	799	May 2, 2022
ANN: Differentiable implicit functions in Julia (optimisation, nonlinear solves and fixed point iterations) Package Announcements optimization , nonlinear , autodiff	3	967	February 8, 2022
Automatic complex valued differentiation for adjoint method Modelling & Simulations autodiff	0	195	November 18, 2023
Automatic differentiation of f(x) constructed by decomposition of a matrix M(x) Optimization (Mathematical) differentiation , flux , machine-learning , numerics	7	2847	January 8, 2020
[ANN] Announcing ImplicitDifferentiation.jl Package Announcements package , announcement , differentiation , optimization , autodiff	8	1155	July 3, 2022

Native eigenvals for differentiable programming

Related topics