Hi, everyone! I’d like to ask several questions that have confused me very much during my learning distributed computing as a complete beginner. So I came here looking for some confirmation, and also hope it can help other beginners get started faster. Let’s take the following singular value decomposition (SVD) problem from the doc as an example.
using Distributed
using LinearAlgebra
addprocs(10)
M = Matrix{Float64}[rand(1000,1000) for i = 1:10];
@time map(svdvals, M);
@time pmap(svdvals, M);
Q1: When we call pmap
, the task is distributed to 10 processes to be executed. Then, where and how on earth are they executed on the computer? Is it equal to I opening 10 Julia REPLs and running svdvals(M[i])
(for i in 1:10
) separately on them (at the same time)? Does each of the 10 so-called worker processes correspond to one of the number under the field Processes
(1/279) in the Task Manager?
Q2: How do I check the maximum number of processes available on the computer? What are the resources that limit it? The number of Sockets
(Is it the number of CPUs?), Cores
or Logical processors
? Or the number of Threads
? (What do these terms refer to?) If the code is to be run on a cluster, how to set #SBATCH -N 1
and #SBATCH --ntasks-per-node=1
(SLURM
)?
Q3: Are there more details or examples about the usage of addprocs(exeflags="--project")
and @everywhere
? According to my understanding, if a code script main.jl
(where there is pmap
in it) depends on some project environment, we should add addprocs(exeflags="--project")
to the top (below using Distributed
) of the main.jl
and add @everywhere
before every package using
(except using Distributed
), e.g.,
# main.jl:
using Distributed
addprocs(exeflags="--project")
addprocs(10) # `addprocs` also needs to be called before `@everywhere`. Right?
@everywhere using PackageA
@everywhere using PackageB
# Other code...
And if we execute it in the terminal, run julia --project main.jl
in its directory. But, in the SVD example obove, why don’t we need to add @everywhere
before using LinearAlgebra
?
Q4: Since addprocs(10)
must be called before @everywhere using PackageA
(I have tried running the latter first and it will report an error: ERROR: On worker 2: KeyError: key PackageA not found
), how can we write a distributed computing function into a module? E.g.,
module MyModule
using Distributed
@everywhere using PackageA: funA
export myfun
function myfun(num_procs)
addprocs(num_procs)
# @everywhere using PackageA: funA -- Should we using PackageA here?
M = [...]
@time pmap(funA, M)
end
end
Q5: How is it fair to compare parallel computing time? If we simply compare @time map(svdvals, M)
and @time pmap(svdvals, M)
, we will dismiss the time cost of addprocs(10)
and the package loading time of all worker processes.
Thank you for your attention. I cannot expect to receive answers from you on all questions, but I will be extremely grateful for any brief guidance you can provide! I myself will also try to answer some of the questions here when I get clear about them.