Pmap and svdvals from documentation speed

Pulpo · September 8, 2021, 9:24pm

I am wondering why the following example of pmap isn’t getting any speedup:

using Distributed
using BenchmarkTools
@everywhere using LinearAlgebra

addprocs(length(Sys.cpu_info()));
M = Matrix{Float64}[rand(1000,1000) for i = 1:10];
@btime pmap(svdvals, M) ; # __1.931 s
@btime map(svdvals,M) ; # ____764.653 ms

This is an example taken from the Distributed computing manual. This behavior doesn’t only happen for the svdvals, but also what I actually need it for, backslash. Also, even if I were to compute the svdvals of each M[i] in a forloop, the pmap still seems to be outperformed.

Increasing the size of M doesn’t really seem to improve things. So I am wondering why this is happening and if there is a way to actually get speedup in a situation where I would need to calculate the svd, or backslash, of a matrix of the type M (or an Array{Float64,(NN,N,N)} along NN)?

Oscar_Smith · September 8, 2021, 9:39pm

Most linear algebra stuff in Julia is already multithreaded.

Pulpo · September 8, 2021, 10:23pm

I understand that, but shouldn’t I see some sort of speedup if instead of

M = Matrix{Float64}[rand(1000,1000) for i = 1:10];

I increase the size to

M = Matrix{Float64}[rand(1000,1000) for i = 1:1000];

which doesn’t seem to be the case. I am in a situation where I have a bunch of large linear systems to solve. To my understanding pmap should be able to do this, but if this isn’t the case then what would be the best way of going about this? Using the MPI.jl?

Oscar_Smith · September 8, 2021, 10:32pm

Are you trying this on a multi-computer cluster? If not, I wouldn’t expect it to be faster.

Pulpo · September 8, 2021, 10:48pm

I have sequential code on the a cluster, but I have not implemented the parallelization yet as I am trying to figure out a (intelligent) way of doing it. I just have been experimenting on my laptop. I should mention that part of the job of each process should also be in building the matrices, they are not random as in the example above.

But you expect that on a cluster I should be able to see some speedup?

Oscar_Smith · September 8, 2021, 11:00pm

For testing on your laptop, set BLAS threads to 1 using BLAS.set_num_threads(1). That said, if you are trying to optimize performance for a cluster, testing on a laptop will not give useful results.

Pulpo · September 8, 2021, 11:09pm

I already tried using BLAS.set_num_threads(1) , but the speeds are still comparable, which is another reason why I am a little confused by this example from the manual. But seems like I should switch to testing on the cluster according to what you are saying.

Topic		Replies	Views
Lack of improvement from distributed pmap, understanding a simple example New to Julia distributed , pmap	6	155	October 29, 2024
Using Julia with @parallel pmap or blank makes no difference in speed. Julia at Scale	3	861	March 22, 2018
Slowdown when computing eigenvalues for list of matrices with pmap Performance parallel , eigenvalues	2	368	September 3, 2021
Several questions on distributed computing from a beginner General Usage question , parallel-computing	21	549	May 28, 2024
For loop in function and multiplication of larger matrices, slow speed in parallel Performance performance , parallel , loops	3	1301	November 22, 2019

Pmap and svdvals from documentation speed

Related topics