Custom function for eigenvectors designed for my sparse matrix can't beat eigen

physicophilic · December 17, 2020, 7:41pm

I have a matrix of the form

Jy = [[ 0      -im*N1     0        0 .......   ] 
         [im*N1    0    -im*N2    0   .....   ] 
         [0       im*N2      0     -im*N3 ....]
          ...........................................     ]

It is tridiagonal, with identically zero diagonal. It is also hermitian as I tried to draw here. It’s size is 2j+1 (this is angular momentum operator Jy in Jz basis in quantum mechanics).

Its eigenvalues are also known to me: j, j-1, j-2,.....0, -1,.....-j. Using the fact that its tridiagonal, I was able to write a routine to find its eigenvectors which is as follows:

function renormalize!(o, e, n, d)
   for k ∈ 1:2:n
       o[k]/=d; e[k+1]/=d
   end 
end

function eigvecJy(v, λ)
    len = length(v)
    o = zeros(len+1); e = zeros(len+1) #odd and even
    η = √(1+λ^2/v[1]^2)
    o[1]=1/η; e[2] = λ/(v[1]*η); 
    for n ∈ 3:2:len-1
        o[n] = (-e[n-1]*λ + o[n-2]*v[n-2])/v[n-1]
        renormalize!(o, e, n, √(1 + o[n]^2))
        e[n+1] = (o[n]*λ + e[n-1]*v[n-1])/v[n]
        renormalize!(o, e, n, √(1 + e[n+1]^2))
    end
    o[end] = (-e[len]*λ + o[len-1]*v[len-1])/v[len]
    d = √(1+o[end]^2); 
    renormalize!(o, e, len-1, d); o[end]/=d;
    return o, e  #eigenvector=o+i e
end

which is slower than eigen in extracting all the eigenvectors.

In fact, in my custom method I only calculate j+1 vectors and others are obtained by a symmetry. Still eigen is faster (and it also gives eigenvalues).

I am very surprised to see this. I don’t understand why … maybe my function isn’t efficiently written. I guess eigen is from LAPACK library, but still I have used so many physical inputs. Is this what I must accept?

It is important to point out why I did this: I need the rotation matrix exp(-im*pi*Jy/2).

At first I tried using exp directly. Jy is hermitian, so as per Julia docs its exponential is calculated using eigen. Or so I thought. But exp clearly does not use eigen because its WAYY slower than my function, and furthermore, it gives NaN all over the place for j>1000. (I am using Julia 1.4.1.) Even expm from scipy was faster than that - so I sticked with it till I found it lacking.

That lead me to write my own function. But when I curiously compared it with eigen directly, I found it faster than my custom implementation. If I knew eigen was so damn fast I wouldn’t even have bothered spending time on this. But now that I have, I can’t accept it!

Oscar_Smith · December 17, 2020, 7:55pm

Have you tried using julia’s Tridiagonal type? That should let julia compute exp efficiently. You also might want to try writing this as cispi(-Jy/2), which might be faster.

physicophilic · December 17, 2020, 8:03pm

Yes. I did define the matrix using Tridiagonal. Although exp doesn’t accept tridiagonal as type when matrix is returned by a function (I asked this question at SO), so had to convert it to a matrix form. You mean cispi from Compat package?

Oscar_Smith · December 17, 2020, 8:13pm

Ah. cispi doesn’t exist in 1.5 yet. then it would be cis(-pi*Jy/2). (cis stands for cos+i*sin ie e^ix, but is usually faster).

physicophilic · December 17, 2020, 8:16pm

cis apparently has only method for reals and complex numbers. It is giving method error for the matrix.

Oscar_Smith · December 17, 2020, 8:20pm

One thing you could do to make this faster would be

function renormalize!(o, e, n, d)
   dinv=1/d
   @inbounds for k ∈ 1:2:n
       o[k]*=d; e[k+1]*=d
   end 
end

Division is about 6x slower than multiplication.

physicophilic · December 17, 2020, 8:36pm

Massive improvement. Thank you so much! I had no idea about division being so different than multiplication! What is @inbounds?

Custom function does better for j=1000 now. And that’s when its only computing j+1 eigenvectors out of 2j+1.

The scaling behaviour for the two methods is slightly different. Although with your suggestions for j=1000 my function ran slightly faster than eigen, for j=3000 eigen still won by a few seconds.

Oscar_Smith · December 17, 2020, 8:39pm

@inbounds promises that you aren’t indexing out of bounds, which removes the checks (so speeds up indexing).

stevengj · December 17, 2020, 9:07pm

A good thing to try here would be to use a symmetric diagonal rescaling to convert this to a similar real-symmetric tridiagonal matrix. Then you can store it in SymTridiagonal matrix and call eigen and it will be fast.

(This is why LAPACK has no specialized eigensolver routines for Hermitian tridiagonal matrices — it only needs to handle real-symmetric tridiagonal matrices, without loss of generality.)

The fact that you happen to know the eigenvalues analytically means that in principal you should be still be able to better, but in a battle of constant factors it can take a lot of tweaking to beat LAPACK with an optimized BLAS. Even so, you might want to transform to a real-symmetric tridiagonal matrix in order to take advantage of the fact that your eigenvectors are purely real up to a diagonal rescaling.

jagot · December 18, 2020, 6:14am

A completely different approach, if you are not interested in the rotation matrix itself, but rather its action on a vector, is operator splitting. It is second-order accurate, and only valid for small angles, so for a finite angle like \pi, you’d have to repeat the small-angle rotation many times. This would be the trade-off between this method and computing the full exponential up-front.

The idea is the following: with a Hermitian tridiagonal matrix with zero diagonal, such as yours

\mathsf{J} = \begin{bmatrix} & a_1 \\ a_1^*&&a_2\\ &a_2^*&&\ddots\\ &&\ddots \end{bmatrix},

where ^* denotes conjugate, split \mathsf{J} into even and odd 2\times2 blocks with the structure

\mathsf{J}_i = \begin{bmatrix} & a_i \\ a_i^* \end{bmatrix}.

We thus make the splitting

\exp(-\mathrm{i} \phi\mathsf{J}) = \exp(-\mathrm{i} \phi\mathsf{J}_{\textrm{even}}/2) \exp(-\mathrm{i} \phi\mathsf{J}_{\textrm{odd}}) \exp(-\mathrm{i} \phi\mathsf{J}_{\textrm{even}}/2) + \mathcal{O}(\phi^3[\mathsf{J}_{\textrm{even}},\mathsf{J}_{\textrm{odd}}]),

where

\mathsf{J}_{\textrm{odd}} = \begin{bmatrix} & a_1 \\ a_1^*&&\\ &&&a_3\\ &&a_3^*&&\ddots\\ &&&\ddots \end{bmatrix},

and similarly for \mathsf{J}_{\textrm{even}}.

The exponential of \mathsf{J}_i is given exactly by a Givens rotation:

\exp(-\mathrm{i}\mathsf{J}_i) = \begin{bmatrix} \cos|a_i| & -\mathrm{i}\frac{a_i}{|a_i|}\sin|a_i| \\ -\mathrm{i}\frac{a_i^*}{|a_i|}\sin|a_i| & \cos|a_i| \end{bmatrix} \approx \frac{1}{1+|a_i/2|^2} \begin{bmatrix} 1-|a_i/2|^2 & -\mathrm{i}a_i\\ -\mathrm{i}a_i^* & 1-|a_i/2|^2 \end{bmatrix},

where the second step is the Crank–Nicolson approximation which is second-order accurate, consistent with the splitting, but much faster than computing the trigonometric functions.

This is a very common trick in time propagation in quantum mechanics.

physicophilic · December 18, 2020, 8:25am

@jagot Amazing result. I am sure I will find it useful later… but I do need the full matrix, and for \theta=\pi/2.

Well, I am evolving a state indeed, and this is part of my discrete time translation. The full evolution is given by U =T \exp(-i \pi Jy/2) where T = \exp(-i J_z^2/2\kappa). To take the powers of this U I need to later diagonalize it too. (But that is not the bottle neck of the program currently.)

I will try to see if I convert my problem to use this result somehow however!

@Oscar_Smith Thanks. And I should mention a small error in your renormalize!: o[k]*=dinv; e[k+1]*=dinv just for completeness.

@stevengj

Interesting result that it holds for all tridiagonals. Here the matrices are known to me:
Symmetric matrix is J_x, and the scaling matrix is \exp(-iJ_z\pi/2).

I first calculated J_x eigenvectors therefore, and then applied the transformation to get them for J_y, but there was overall speed loss compared to my code given above.

However, there’s a silver lining.

J_x eigenvectors themselves were calculated very quickly (~8 times faster for j=1000). I discovered that the eigenvector matrix S_x for J_x, has some mysterious relation to R = \exp(-i J_y \pi/2). I verified it by checking for many j values. By manipulating its columns should be able to get R a lot faster. There seems to be another symmetry which can cut work by 1/2.

I will have to see what’s going on and will give a bigger reply!

jagot · December 18, 2020, 9:08am

If I remember correctly, the exponential of the angular momentum operators are equivalent (or at least very closely related) to the Wigner D matrices, for which I’m sure there are optimized algorithms available. I think @sethaxen implemented something along these lines.

physicophilic · December 18, 2020, 10:28am

Indeed! The small D matrix is precisely what I am looking for! Any suggestion on that line would be great. Might be simpler to implement too.

sethaxen · December 18, 2020, 7:32pm

Hi! I don’t have a completed public implementation of Wigner D-matrices, but I can give some input. If you actually need the matrices themselves, then for low order at least, you likely can’t do any better than recurrence relations to fill the matrices. There are relations for both little-d and big-D matrices, sounds like you want little-d. I believe https://github.com/jishnub/WignerD.jl and https://github.com/NickMcNutt/Groups.jl both implement these. GSL also can compute these, and there are likely more packages too.

I haven’t worked with high orders. IIRC though, computation with matrix exponentiation directly or via eigendecomposition scales as \mathcal{O}(L^4), where L is the maximum order desired. Recurrence relations, however, should scale as \mathcal{O}(L^3) As @jagot pointed out, if you actually need the action of little-d matrix on a vector, then you can do so using various techniques, including polynomial approximation (https://github.com/jagot/MatrixPolynomials.jl), Krylov subspace methods (https://github.com/SciML/ExponentialUtilities.jl), etc, and I believe these should have a complexity of \mathcal{O}(L^3) but without allocating the intermediate matrix. The constant factor matters though.

I actually am planning to work on the action of the exponential for the more general big-D matrix, including working out heuristics for switching between strategies. Let me know if that’s something you’re interested in, and perhaps I could separate that code out into its own project.

physicophilic · December 22, 2020, 3:45pm

@sethaxen Thanks for your reply. I tried both packages with recursive relations. One of them (Groups) is not working at all. Other is working but with some weird errors. It turns out my method is doing better than it. And I think I have improved it to my satisfaction using the trick I mentioned in my last reply.

Here’s what I am using now. Thanks to @stevengj for suggesting trying to find J_x eigenvectors.
Since all my matrices are in J_z basis, the components of eigenvector matrix for J_x are given by

S_{m\lambda} = \; _z\langle j, m| j, \lambda\rangle_x = \; _z\langle j, m| \exp(-i \pi J_y/2) | j, \lambda \rangle_z \cdot (\text{phase})

so computing just J_x eigenvectors is enough to get my special Wigner d matrix.

That is what I am doing now. Thanks to all of you, and a special thanks to @Oscar_Smith for his multiplication trick. I think I would like to mark that as the answer because other answers are mainly suggestions.

MikaelSlevinsky · December 22, 2020, 6:04pm

Three ways to make your code even faster:

you can eliminate almost all divisions by working with arrays that are numerically equal to 1/v[n] and v[n-1]/v[n].
you don’t need to renormalize so often. Using an if statement, you can accumulate the square of the 2-norm, and if gets too close to overflow, only then must you renormalize.
you can vectorize the code across (neighbouring) eigenvectors. This requires a bit more work to tease Julia’s compiler to do this for you.

To give you some ideas, the following C code (probably you’re not looking for a C solution, but nevertheless…) does all of these things when performing a transposed eigenmatrix-vector product (of symmetric tridiagonal eigenvectors), V^\top c. The default version is arguably more legible https://github.com/MikaelSlevinsky/FastTransforms/blob/5ee9c149067fa1d1484ea4cf291f9e8d8f1e0bb6/src/recurrence/recurrence_default.c#L113-#L144
but the macro gets used to generate vectorized versions
https://github.com/MikaelSlevinsky/FastTransforms/blob/master/src/recurrence/recurrence.h#L103-#L176

physicophilic · December 25, 2020, 11:23am

Thanks!

Currently a different part of the code is the bottle neck - which I think cannot be optimized: I call eigen on a unitary matrix \Lambda_T R with diagonal \Lambda_T. Unless that can be optimized further speed gain will not help.

Although I am quite curious about how much improvement would an if based renormalize make… I think I will give these changes a shot since they look easy enough to put in.

Edit: Such a simple change, and an unbelievable improvement. Renormalizing only when needed makes the code almost linear in j.

To quote example, earlier j = 1000 took 0.1 second, and j=2000 took about 2 seconds. Now j=2000 takes 0.2 seconds. Earlier j=10,000 seemed to take days… now takes 10 seconds. The gain is massive, and changes the scaling behaviour too.

This is the answer. Don’t renormalize all the time.

@MikaelSlevinsky I am curious about how you detect if overflow is about to occur by finding eps()/floatmin() which gave e291 on Julia. Can you explain a bit how that works? The maximum float64 is larger something like e320 right? And why not directly put a huge number there?

MikaelSlevinsky · December 27, 2020, 7:54pm

Yeah you just need to make sure that the norm doesn’t overflow. Any reasonable condition should do the trick.