QR decomposition on large (>1TB) matrix

casv2 · November 21, 2020, 7:06pm

I’d like to perform a QR decomposition on a large (> 1TB) matrix. Much larger than the available memory on a single node on our HPC.

My idea is to set up a DistributedArray (GitHub - JuliaParallel/DistributedArrays.jl: Distributed Arrays in Julia) or MPIArray (GitHub - barche/MPIArrays.jl: Distributed arrays based on MPI onesided communication) to share the data across a set of nodes to be able to fit the matrix into memory. I’d then like to perform (a distributed) QR decomposition or Ridge Regression. However, it seems these packages do not support a QR decomposition?

I’ve tried using Elemental.jl (GitHub - JuliaParallel/Elemental.jl: Julia interface to the Elemental linear algebra library.) but that seems to spawn processes with copies of the matrix, resulting in OutOfMemory() errors.

Can anyone point me in the right direction to try and solve this problem? Many thank in advance for helping me out!

rveltz · November 21, 2020, 7:25pm

Can’t you use a Matrix Free version?

boywithacoin · December 14, 2023, 4:49pm

wow, I have never seen or worked with large matrices. How many rows and cols does this matrix have?

amontoison · December 14, 2023, 11:07pm

I think that QRMumps can handle this problem with the Julia interface QRMumps.jl. But you will need to compile a local version with StarPU to enable MPI support. The version precompiled by Yggdrasil doesn’t use StarPU.
QRMumps, like MUMPS, are tailored for very large problems.

stevengj · December 14, 2023, 11:15pm

MUMPS is for very large sparse problems.

If you have a large dense matrix, you want a parallel dense-direct library, i.e. something like Elemental.jl.

Probably you aren’t using it correctly? Handling distributed matrices, with only a chunk of the matrix in each process’s memory, is the whole point of the Elemental library AFAIK.

More generally, the question is where does your matrix come from, and can you exploit some special structure? e.g. is it sparse, or is there a fast way to multiply matrix-times-vector? Do you need the whole QR decomposition, or can you use some approximation? e.g. if you are using it to solve a least-squares problem, can you use randomized least squares?

abuttari · December 17, 2023, 9:17pm

Hello,
qr_mumps is actually developed for solving sparse problems through a multifrontal factorization but also contains some dense linear algebra routines including a parallel QR factorization. The issue, though, is that in order to handle a 1TB matrix you need distributed memory parallelism, which qr_mumps does not support at the moment. One option is to use ScaLAPACK; there is a related discussion here.

Topic		Replies	Views
distributed-memory QR / TSQR / linear algebra Numerics	1	1070	March 20, 2018
Distributed/out-of-memory/GPU calculations on sparse matrices General Usage gpu	2	1355	March 22, 2017
Options for large-scale dense linear algebra Julia at Scale hpc	1	592	March 22, 2022
Which algorithm does Julia use for matrix QR decomposition? New to Julia factorization , matrix	5	585	December 13, 2023
Julia can't solve sparse linear system, MATLAB can General Usage matlab , linearalgebra , sparse	5	876	May 26, 2022

QR decomposition on large (>1TB) matrix

Related topics