QR decomposition on large (>1TB) matrix

I’d like to perform a QR decomposition on a large (> 1TB) matrix. Much larger than the available memory on a single node on our HPC.

My idea is to set up a DistributedArray (GitHub - JuliaParallel/DistributedArrays.jl: Distributed Arrays in Julia) or MPIArray (GitHub - barche/MPIArrays.jl: Distributed arrays based on MPI onesided communication) to share the data across a set of nodes to be able to fit the matrix into memory. I’d then like to perform (a distributed) QR decomposition or Ridge Regression. However, it seems these packages do not support a QR decomposition?

I’ve tried using Elemental.jl (GitHub - JuliaParallel/Elemental.jl: Julia interface to the Elemental linear algebra library.) but that seems to spawn processes with copies of the matrix, resulting in OutOfMemory() errors.

Can anyone point me in the right direction to try and solve this problem? Many thank in advance for helping me out!

4 Likes

Can’t you use a Matrix Free version?

1 Like

wow, I have never seen or worked with large matrices. How many rows and cols does this matrix have?

I think that QRMumps can handle this problem with the Julia interface QRMumps.jl. But you will need to compile a local version with StarPU to enable MPI support. The version precompiled by Yggdrasil doesn’t use StarPU.
QRMumps, like MUMPS, are tailored for very large problems.

MUMPS is for very large sparse problems.

If you have a large dense matrix, you want a parallel dense-direct library, i.e. something like Elemental.jl.

Probably you aren’t using it correctly? Handling distributed matrices, with only a chunk of the matrix in each process’s memory, is the whole point of the Elemental library AFAIK.

More generally, the question is where does your matrix come from, and can you exploit some special structure? e.g. is it sparse, or is there a fast way to multiply matrix-times-vector? Do you need the whole QR decomposition, or can you use some approximation? e.g. if you are using it to solve a least-squares problem, can you use randomized least squares?

1 Like

Hello,
qr_mumps is actually developed for solving sparse problems through a multifrontal factorization but also contains some dense linear algebra routines including a parallel QR factorization. The issue, though, is that in order to handle a 1TB matrix you need distributed memory parallelism, which qr_mumps does not support at the moment. One option is to use ScaLAPACK; there is a related discussion here.