I’d like to perform a generalized eigendecomp on my (CUDA-enabled) GPU.

My matrix is dense, complex, and non-hermitian. I’d like the full decomposition (all the eigenvectors), not just a targeted eigensolve.

I know that CUDA itself doesn’t have any out-of-the-box routines that do this (they have some routines for hermitian matrices etc). I’ve seen MAGMA, and actually gone through their C++ code. It turns out most of their subroutines for generalized eigendecomp actually kickback to the CPU, so it’s not very efficient.

Are there any Julia packages that offer this kind of functionality? Perhaps a simple QR algorithm that is general enough to work on CUDA arrays? I hear these kinds of problems on the GPU are somewhat of “research problems,” since there are quite a few problem-dependent tricks at play. Still, I thought I’d ask.

Thanks!