Can i have SharedArrays on several different machines? Hybrid parallel model

question

#1

Suppose I have the following problem:

  1. I need to call a function f many times in an estimation procedure.
  2. computing f is very expensive. I usually use 6 workers to compute this in parallel and it takes about 2 minutes. The workers use a set S of SharedArray to do this job.
  3. Suppose I had access to several large machines. Is it possible to distribute this computation model across machines? My main doubt is how to make sure the SharedArray s of group one, say, S1, is allocated on machine one, the SharedArray s of group two are on machine two and so on. This is important, since they are large in memory.
  4. Just to be clear: Ideally I’d be using Base.Threads in step 2. I have been advised not to go down that route at this moment, since the feature is not mature enough yet. I think what I have in mind is sometimes called a hybrid distributed computation, using several CPUs on one machine, connecting multiple machines.
  5. thanks!

#2

I am currently dealing with a similar situation (if I understood your problem correctly), running MCMC models in Julia. What I went for is “poor man’s mapreduce” using the filesystem for interprocess communication: doing the calculation on each server/core, save it (serialize), then merge. All steps involve separate processes. I am sure you will get more elegant solutions, but this proved to be very robust and easy to develop: I just made it work on one process, then did the rest using scp/ssh. Maybe there is a library that would organize this?


#3

I think you understood my problem perfectly well there.

I’m not sure, on the other hand, I understand your application. when you say

doing the calculation on each server/core, save it (serialize), then merge

do you have to communicate the results of each node back to the master process during computation, or is merge part of post-computation analysis in your case? I can easily see how I would do this in the latter case.
If that is not your case (i.e. your dealing with the former), or otherwise, do you have a suggestion for how I could implement your strategy? Just a comment on what you think about down below would be very helpful - thanks!

Here is the outline of my situation.

Problem

Say I have 3 computers, each with multiple CPUs. call computer 1 the master.

  1. master controls the computation of f(x) on each of computers 2 and 3. In particular, it sends x2 to 2 and x3 to 3, and reads their respective results y2 and y3. Think of f as a loss function.
  2. based on y2=f(x2) and y3=f(x3), the master comes up with a new candidate for each computer, x2', x3'.
  3. In MCMC terminology (I think), the kernel resides on the master and it comes up with a new candidate parameter for each participating computer.
  4. All computers have access to the same file system, e.g. all can read/write ~/

I would not know how to implement the fact that the master has to wait for new results y to appear. I read about Julia tasks and co, which seems to be just what I need, but that is only for actual julia workers.

Proposition

I basically need to know how to do the following, although it may be quite time consuming in my case to initiate the model from scratch each time.

# pseudo code. ;-)

# imagine that compute_f.jl is a script to compute f
# the script takes a commandline arg `param`, which becomes `x` for `f(x)`

# this function here is running on computer 1 for the entire duration of the procedure.
function master_process()
     converged = false
     minf = Inf
     x = initial_x()
     while !converged
         x2 = nextx(x)  # random innovation to x
         x3 = nextx(x)

         run(`ssh computer2 julia compute_f.jl --param $x2`)  # computes f(x2) and saves y2 as ~/result2.dat
         run(`ssh computer3 julia compute_f.jl --param $x3`)  # saves as ~/result3.dat
         # wait...
         while !isfile("~/result2.dat")
              sleep(1)
              println(".")
         end
         # as soon as the file ("~/result2.dat" appears, read result2.dat...
         y2 = read("~/result2.dat")
         y3 = read("~/result3.dat")

         # find best value
         y = min(y2,y3)
         x = indmin(y2,y3)[x2,x3]

         # test convergence
         converged = conv(y,minf)  # test convergence somehow...suppose its `false` for now :-)

         # delete files
         rm("~/result2.dat")
         rm("~/result3.dat")
 
     end # while
end

#4

My task was very simple:

  1. master has the data and the code,
  2. rsyncs it to nodes,
  3. executes a Julia process via ssh, which dumps the results in a file (serialize)
  4. master collects the results via rsync.
    This runs only once. Signalling when it was finished on nodes was indeed tricky, I would just check for the presence of the result file.

So basically like your control flow above. Now this is indeed quite inelegant, but very robust: if processes fail (which happened quite a bit during the development phase, because of a bug that made a NaN which propagated and at some point lead to an error), I could just rerun them manually.

The right thing to do would probably be using cluster managers. I have to admit that I had a pressing deadline and doing it the low-tech way seemed like a viable option, so lacking a worked example I postponed learning about this part of Julia.

Hopefully someone more knowledgeable here can give you more help, I only replied because I saw that the topic was without an answer and I did solve a similar problem. But learning about this from me is like learning carpentry from someone who once cut a table in two with a angle grinder because they had to dispose of it.