Distributed/out-of-memory/GPU calculations on sparse matrices

gpu

#1

Hello, I am a junior PhD student in theoretical physics. I am working on a DMRG (https://en.wikipedia.org/wiki/Density_matrix_renormalization_group) algorithm in Julia.

A key part of the algorithm is singular value decomposition (SVD) and diagonalization (Lanczos or Arnoldi) on large sparse matrices (MxM where M = d^2*chi^2, chi=10^2-10^4 and d = 2-10 so that M is of the order 10^6-10^10 in worst case).

It seems to me like I need to make sure that the code can work with arrays which don’t fit in memory, and Blocks.jl or ScaLAPACK bindings looks promising. Ideally, I think it would benefit from parallelization and GPU since at least some operations are embarrasingly parallel.

But I am very new to Julia and have only had a brief introduction to MPI: does anyone have recommendations for which tools and design could be useful for this problem?

Regards,
Axel Gagge


#2

I would recommend to take a look at DistributedArrays.jl as a first step and there is also @shashi’s Dagger.jl for out-of-core computations.

cc: @andreasnoack


#3

How many non-zeros do you in the array? If you have a big machine, it is likely that you store the sparse matrix in memory. You might want to benefit from more processors and the vectors generated from Lanczos iterations are also dense. You could take a look at the examples in https://github.com/JuliaParallel/Elemental.jl to see how Elemental’s distributed matrix structures can be used together with an iterative SVD written in Julia. Unfortunately, I don’t have the eigenvalue version ready yet.