GPU generalized eigendecomposition

I’d like to perform a generalized eigendecomp on my (CUDA-enabled) GPU.

My matrix is dense, complex, and non-hermitian. I’d like the full decomposition (all the eigenvectors), not just a targeted eigensolve.

I know that CUDA itself doesn’t have any out-of-the-box routines that do this (they have some routines for hermitian matrices etc). I’ve seen MAGMA, and actually gone through their C++ code. It turns out most of their subroutines for generalized eigendecomp actually kickback to the CPU, so it’s not very efficient.

Are there any Julia packages that offer this kind of functionality? Perhaps a simple QR algorithm that is general enough to work on CUDA arrays? I hear these kinds of problems on the GPU are somewhat of “research problems,” since there are quite a few problem-dependent tricks at play. Still, I thought I’d ask.

Thanks!

Many parts of dense eigen-decomposition algorithms are AFAICT unavoidably done with single-element indexing, so just porting them to GPU won’t work. MAGMA implements the state of the art in hybrid algorithms, doing as much as practical with large blocks on the GPU. I don’t think the MAGMA group have published a nonsymmetric GEV solver, but if your mass matrix (B in Ax = \lambda Bx) is nonsingular you can factor it yourself first to make an ordinary EV problem. There is a MAGMA_jll package, but I don’t think anyone has published convenient Julia wrappers yet and some early work on Julia wrappers at an unregistered repo in the Julia Linear Algebra org.