Using Multithreading and multiprocesing together to construct a matrix

I have the following problem where I want to construct a matrix (roughly of size 10^4*10^4) with a rule to calculate the matrix elements. A serial construction of this matrix takes a lot of time. I thought of using multithreading to construct the matrix in the following way

function F_array(Nsite)
    result = zeros(ComplexF64, Nsite, Nsite)
    Threads.@threads for m in 1:Nsite
        for n in m+1:Nsite
            result[m, n] = sum(rand(ComplexF64) for i in 1:10000)
        end
    end
  return result
end

This function already gives a much better time than the serial approach but I’m thinking could I do better if I can distribute the calculation over multiple machines along with using multithreading? (I’m working on a computer cluster) but I’m not sure how to do that.

This very much depends on how expensive the computation of the matrix elements is. Different machines don’t share memory and I assume you want the final matrix to live in the memory of one machine. You would hence need to build parts of the matrix on different machines and then transfer the results back to the “master” machine to assemble to final matrix. Whether this approach will be faster depends on the cost of data communication vs the benefit of parallelizing the computation of the matrix elements.

4 Likes

For distributed computing, we have the Distributed standard library and MPI.jl. The former is probably easier to get started with, the latter is the de-factor standard for “large-scale” distributed computing (it also uses fast interconnects, if available in your HPC cluster). The big disadvantage of MPI.jl is that you pretty much can’t use it interactively and thus forces you to change your workflow (it’s a different programming paradigm).

1 Like