I think that QRMumps can handle this problem with the Julia interface QRMumps.jl. But you will need to compile a local version with StarPU to enable MPI support. The version precompiled by Yggdrasil doesn’t use StarPU.
QRMumps, like MUMPS, are tailored for very large problems.
If you have a large dense matrix, you want a parallel dense-direct library, i.e. something like Elemental.jl.
Probably you aren’t using it correctly? Handling distributed matrices, with only a chunk of the matrix in each process’s memory, is the whole point of the Elemental library AFAIK.
More generally, the question is where does your matrix come from, and can you exploit some special structure? e.g. is it sparse, or is there a fast way to multiply matrix-times-vector? Do you need the whole QR decomposition, or can you use some approximation? e.g. if you are using it to solve a least-squares problem, can you use randomized least squares?
Hello,
qr_mumps is actually developed for solving sparse problems through a multifrontal factorization but also contains some dense linear algebra routines including a parallel QR factorization. The issue, though, is that in order to handle a 1TB matrix you need distributed memory parallelism, which qr_mumps does not support at the moment. One option is to use ScaLAPACK; there is a related discussion here.