MUMPS is for very large sparse problems.
If you have a large dense matrix, you want a parallel dense-direct library, i.e. something like Elemental.jl.
Probably you aren’t using it correctly? Handling distributed matrices, with only a chunk of the matrix in each process’s memory, is the whole point of the Elemental library AFAIK.
More generally, the question is where does your matrix come from, and can you exploit some special structure? e.g. is it sparse, or is there a fast way to multiply matrix-times-vector? Do you need the whole QR decomposition, or can you use some approximation? e.g. if you are using it to solve a least-squares problem, can you use randomized least squares?