I think you understood my problem perfectly well there.
I’m not sure, on the other hand, I understand your application. when you say
doing the calculation on each server/core, save it (serialize), then merge
do you have to communicate the results of each node back to the master process during computation, or is
merge part of post-computation analysis in your case? I can easily see how I would do this in the latter case.
If that is not your case (i.e. your dealing with the former), or otherwise, do you have a suggestion for how I could implement your strategy? Just a comment on what you think about down below would be very helpful - thanks!
Here is the outline of my situation.
Say I have 3 computers, each with multiple CPUs. call computer 1 the
master controls the computation of
f(x) on each of computers 2 and 3. In particular, it sends
x2 to 2 and
x3 to 3, and reads their respective results
y3. Think of
f as a loss function.
- based on
y3=f(x3), the master comes up with a new candidate for each computer,
- In MCMC terminology (I think), the
kernel resides on the master and it comes up with a new candidate parameter for each participating computer.
- All computers have access to the same file system, e.g. all can read/write
I would not know how to implement the fact that the master has to wait for new results
y to appear. I read about Julia
tasks and co, which seems to be just what I need, but that is only for actual julia
I basically need to know how to do the following, although it may be quite time consuming in my case to initiate the model from scratch each time.
# pseudo code. ;-)
# imagine that compute_f.jl is a script to compute f
# the script takes a commandline arg `param`, which becomes `x` for `f(x)`
# this function here is running on computer 1 for the entire duration of the procedure.
converged = false
minf = Inf
x = initial_x()
x2 = nextx(x) # random innovation to x
x3 = nextx(x)
run(`ssh computer2 julia compute_f.jl --param $x2`) # computes f(x2) and saves y2 as ~/result2.dat
run(`ssh computer3 julia compute_f.jl --param $x3`) # saves as ~/result3.dat
# as soon as the file ("~/result2.dat" appears, read result2.dat...
y2 = read("~/result2.dat")
y3 = read("~/result3.dat")
# find best value
y = min(y2,y3)
x = indmin(y2,y3)[x2,x3]
# test convergence
converged = conv(y,minf) # test convergence somehow...suppose its `false` for now :-)
# delete files
end # while