Automatic differentiation of f(x) constructed by decomposition of a matrix M(x)

I have some objective function f(x), e.g. some type of error function for which I want to calculate its gradient and hessian. To construct f(x), I need the eigenvectors of a certain matrix \textbf{M}(x).

I have tried both forward from (ForwardDiff.jl) and reverse AD from (Flux.jl) with different decomposition methods (eigen, eigs, svd, svdl) and all of them fail for different reasons.

Here is a minimum working example:

using ForwardDiff, LinearAlgebra, Arpack, IterativeSolvers, Flux

function f(x::Vector)
    M = kron(x,(x.^2)') #Build matrix with x as input
    vec_1 = eigs(M)[2][:,1] #Choose one eigenvector. Replace eigs by any other matrix decomposition method
    difx = norm(x-vec_1) #Some random function of an eigenvector

df(x) = gradient(f,x)[1] #using Flux.jl
g_f = x -> ForwardDiff.gradient(f, x) #using ForwardDiff.jl

df(rand(5)) #Fails
g_f(rand(5)) #Fails

Is it because all the factorization methods I’ve choosen use some non-native Julia code? Would writting my own method in Julia, i.e. Lanczos decomposition, solve it?

See my answer in a previous thread on the same topic: Native eigenvals for differentiable programming

If you can write down exactly what f(x) you are computing, maybe there is a way to express it without using eigenvectors.

1 Like

ChainRules.jl has custom Adjoints for cholesky and SVD

So once the the Zygote ChainRules PR is merges that will work.

Nabla has them already. (most rules in ChainRules come from Nabla)

You could use Nabla.

A recent paper gave an improved algorithm that no-one is using yet.
W Wang et al ‎2019: Backpropagation-Friendly Eigendecomposition

1 Like

Thanks for the link @stevengj

Long story short: My objective function f(x::Vector) is an error function of the form ||\textbf{J}_T - \textbf{J}_{exp}(x) ||, where \textbf{J}_T is a target matrix and \textbf{J}_{exp}(x) is a simulated matrix. The latter is calculated from the phonon modes (\vec b) and frequencies (\lambda) of certain 2D systems, i.e. J^{(m,n)}_{exp}(x)= h(\vec b(x), \lambda(x)). To obtain \vec b(x), \lambda(x) I need to diagonalize some Hessian matrix \textbf{A}(x) describing the potentials of my system. So unfortunately it is not a more simple eigenvalue problem. I am not familiar with adjoint methods, so I would need to see if they could be applicable to my problem.

(I skipped the physical (fun!) details and whole expression of f(x) of my problem to keep it brief)

Is there a way to calculate it without modes? A lot of formulas involving modal expansions have other equivalent descriptions that don’t involve explicit eigenvalues… (I’ve worked on a lot of optimization problems arising from wave problems that are initially expressed in terms of modes but can be reformulated in some other way, so it would be helpful to know more precisely what you are doing.)

I haven’t studied yet if the matrix \textbf{J}_{exp} can be derived from other observable of the system (although it is interesting to think about it), but let me try to explain the theoretical model and how we want to realize it experimentally. I will put some references below.

As I said, \textbf{J} describes spin-spin interactions between particles (in my case ultracold ions), and we target to simulate 2D spin systems described by a particular \textbf{J} [4]. The way these interactions are generated in our experiments (or models), is by inducing relative phase shifts of the wavefunctions describing each particle, so roughly speaking each element J^{i,j} describes the phase difference of the wavepackages between particles i,j.

Experimentally, this is obtained by inducing (virtual) oscillations of each ion by exciting the phonon modes [1] of a 2D crystalline structure formed by the ions under an external confinement. Because the natural modes of the crystal do not lead to the desired spin-spin interactions, we want to modify these modes to induce the oscillations which will provide us the desired phase shifts and interactions. We use optical tweezers to modify the modes by changing the confinement of each ion.

By minimizing ||\textbf{J}_T - \textbf{J}_{exp}(\vec{x}) ||, I want to obtain the vector \vec{x} which describes the confinement of the tweezers for each ion on the crystal.

I found yesterday that through perturbation theory I can avoid doing a matrix decomposition when constructing the objective function, however the perturbative treatment could not be valid in our system. Nabla.jl is allowing me to obtain the gradient if I use a SVD (as suggested by @oxinabox, thanks!) but is giving me singularity errors once in a while. I will try to debug this singularity errors and/or verify if the PT treatment is valid. Another (short-term) alternative would be to check pytorch/tensorflow AD capabilities, but having to write my code again is holding me back :unamused:

[1] K. Kim, M.-S. Chang, R. Islam, S. Korenblit, L.-M. Duan, and C. Monroe, Phys. Rev. Lett. 103 , 120502 (2009).
[2] A. Bermudez, J. Almeida, F. Schmidt-Kaler, A. Retzker, and M. B. Plenio, Phys. Rev. Lett. 107 , 207209 (2011).
[3] D. F. V. James, Applied Physics B: Lasers and Optics 66 , 181 (1998).
[4] arXiv:1912.07845v1

If the signularity errors related to

then pytorch and autograd fails in same way.
idk about tensorflow.

J sounds a lot like a Green’s function or a resolvent operator, which is much nicer to compute and differentiate directly than anything involving eigenvectors…

(In physics education, we often write down Green’s functions etcetera in the basis of the eigenfunctions because these are easier to think about, but when it comes to computing them you often want to abandon this approach.)

Experimentally, this is obtained by inducing (virtual) oscillations of each ion by exciting the phonon modes of a 2D crystalline structure formed by the ions under an external confinement.

The experiments aren’t computing modes, they are putting in an excitation and measuring the response. i.e. a Green’s function of some kind. Maybe the computation can do the same thing?