Can’t you just tell the AD package what the adjoint to
eigvals is? Then you don’t have to differentiate through it.
Can’t you just tell the AD package what the adjoint to
I’m pretty new to auto diff; how exactly would this work?
You can see how it’s done for some common deep learning functions here
A guide is provided here
Perhaps ArnoldiMethods.jl Iterative Solvers.jl and KrylovKit.jl might work for you. (all implemented in Julia.
Autograd used to have it before the upgrade to 1.0 made descending into tuples for backward mode ad difficult. I’ve implemented some hacks for svd myself, I’d have to look them up (essentially you’d have to unwrap the SVD return type). Anyway the backward mode code for svd is still in the autograd package, is just not accessible. I believe there is an open issue.
I tried ArnoldiMethods but it seemed there was some sampling in there (not sure, I’m not versed in these numerical methods), making it impossible to do autograd… I’ll check the other things but I think making a custom adjoint might be easiest.
Would be great if you could look them up, then I’ll try and start implementing something for eig. Any idea how hard it would be?
You could always implement the derivative yourself. The adjoint formula for differentiating eigenvalues with respect to changes in the matrix is pretty simple; see 18335/adjoint.pdf at master · mitmath/18335 · GitHub (it’s equivalent to the Hellmann–Feynman theorem); differentiating scalar functions of the eigenvectors is a bit more complicated but not too bad. (My notes show the case of Hermitian matrices; the extension to non-Hermitian problems is similar but involves left and right eigenvectors — IIRC, every xᵀ is replaced with the left eigenvector.)
One snag is that eigenvalues cease to be differentiable when two eigenvalues cross. (And eigenvector methods of non-Hermitian problems can become ill-conditioned near crossings.) Crossings may seem unlikely, but eigenvalue-optimization problems often push eigenvalues towards crossings (especially in Hermitian problems). There is a way around it using generalized gradients, as I review in my notes here: 18335/eigenvalue-adjoint.pdf at master · mitmath/18335 · GitHub … but usually in my own work I’ve found that’s it’s possible to reformulate ostensibly eigenvalue-optimization problems in ways that avoid explicit eigenvalue computations. For example, in terms of SDPs or in terms of resolvent operators.
You really don’t want to apply automatic differentiation blindly to an iterative solver. Differentiating a recurrence relationship like this is much more expensive than differentiating the final result.
For the same reason, a Julia-native eigensolver wouldn’t help here. Applying AD to an eigenvalue algorithm like QR is a non-starter. (All eigenvalue algorithms for matrices bigger than 4×4 are iterative, thanks to the Abel–Ruffini theorem.)
Automatic differentiation is a great tool, but at some point you should also learn how to compute derivatives yourself, if only to understand how to use these tools effectively. Moreover, once you get to a complicated enough problem that involves external libraries, AD tools quickly hit a wall — I’ve never been able to use AD for a non-trivial problem in PDE-constrained optimization.
In case anyone is interested, I tried doing exactly this with all the iterative solvers @carstenbauer mentioned, on Julia 1.1, 1.2rc and 1.3-alpha with Zygote on its release version and master branch, along with GenericLinearAlgebra. The result is here: https://github.com/ericphanson/Attempt-diff-eigmin (I focussed on finding the minimal eigenvalue of a Hermitian matrix). None of it worked, and it sounds like it’s not a good idea anyway.
I also included an implementation of the analytic solution for the non-degenerate complex Hermitian case (which is just
v the associated eigenvector). Zygote didn’t seem to like the complex numbers, so I packed the matrices as real vectors (which is what I get out of the neural net anyway). That seems to work fine. (edit: I realized my real vector to complex Hermitian matrix code is wrong! I thought I had tested that, but I guess not. edit2: fixed)
I also tried using PyTorch via PyCall using some code from Mike Innes’ JuliaCon talk, but I couldn’t get that to work either.
I got a very simple iterative eigensolver to AD with Zygote sometime ago, so it can be done. But indeed it doesn’t scale; much better is to write a custom adjoint that solves the perturbative equation iteratively.
Thanks for the detailed response. I’m optimizing over a bandstructure so my eigenvalues can/will be degenerate, so I’ll have to use generalized gradients. I’ll start implementing soon!
Indeed, a big part of the design of ChainRules that I am happy about is that the rules are separated out from the main package and other packages supposed to add rules to their functions. While people tend emphasize the “automatic” differentiation part of differentiable programming, I think that where it is really helpful is when you put as many rules as you can, and only let it use a generic fallback derivative on the rest. If we add a bunch of rules in the intermediate steps throughout DiffEq, QuadGK, FFTW, NLsolve, Optim, etc., and then effectively chain them together, I think we will finally get something Steven could use for a non-trivial problem . But that won’t happen without buy-in from the ecosystem.
Emphatically agree. The biggest problem with manual gradients is how they lead to complex code. However, the bottlenecks are usually pretty clear, and rely on primitives (solvers), for which it is often easy to code adjoints manually. If the AD system is able to cope with the rest of the code, and with a few well placed adjoint declarations for the solvers, it becomes possible to get performant AD for non trivial scientific computing applications. My understanding is that this still needs major work on the part of Zygote and friends to be practical, though…
Check out the methods in our paper and the references therein: https://arxiv.org/abs/1405.4350
This document might be of interest to anyone reading this thread
“An extended collection of matrix derivative results for forward and reverse mode algorithmic differentiation”, Mike Giles
Here’s a reverse mode
"return `eigen` as well as a function `back(Δ)`" function gradeigen(A) e,V = eigen(A) n = size(A,1) (e,V), function (Δ) Δe, ΔV = Δ F = [i==j ? 0 : inv(e[j] - e[i]) for i=1:n, j=1:n] inv(V)'*(Diagonal(Δe) + F .* (V'ΔV))*V' end end # Test A = randn(3,3) Δ = (complex.(randn(3)),0complex.(ones(3,3))) Δe = Δ using FiniteDifferences, LinearAlgebra fd = central_fdm(15,1) eV, back = gradeigen(A) back(Δ) - j′vp(fd, eigvals, Δe, complex.(A)) # either me or j′vp does not seem to get imaginary parts correct
I suspect that you should use
transpose - not
adjoint) in the expression for the derivative. (I didn’t check your code explicitly, but this is my general experience with derivatives of linear algebra expressions.)
Thanks, the derivative for the eigvals seems correct, at least according to Zygote’s
gradtest. If I change to
gradtest fails. Something does seem to be up with the sensitivity to
is this code based the document you posted by Mike Giles?
Yes. See this incomplete PR as well