I think you understood my problem perfectly well there.
Iâ€™m not sure, on the other hand, I understand your application. when you say
doing the calculation on each server/core, save it (serialize), then merge
do you have to communicate the results of each node back to the master process during computation, or is merge
part of postcomputation analysis in your case? I can easily see how I would do this in the latter case.
If that is not your case (i.e. your dealing with the former), or otherwise, do you have a suggestion for how I could implement your strategy? Just a comment on what you think about down below would be very helpful  thanks!
Here is the outline of my situation.
Problem
Say I have 3 computers, each with multiple CPUs. call computer 1 the master
.

master
controls the computation of f(x)
on each of computers 2 and 3. In particular, it sends x2
to 2 and x3
to 3, and reads their respective results y2
and y3
. Think of f
as a loss function.
 based on
y2=f(x2)
and y3=f(x3)
, the master comes up with a new candidate for each computer, x2'
, x3'
.
 In MCMC terminology (I think), the
kernel
resides on the master and it comes up with a new candidate parameter for each participating computer.
 All computers have access to the same file system, e.g. all can read/write
~/
I would not know how to implement the fact that the master has to wait for new results y
to appear. I read about Julia tasks
and co, which seems to be just what I need, but that is only for actual julia workers
.
Proposition
I basically need to know how to do the following, although it may be quite time consuming in my case to initiate the model from scratch each time.
# pseudo code. ;)
# imagine that compute_f.jl is a script to compute f
# the script takes a commandline arg `param`, which becomes `x` for `f(x)`
# this function here is running on computer 1 for the entire duration of the procedure.
function master_process()
converged = false
minf = Inf
x = initial_x()
while !converged
x2 = nextx(x) # random innovation to x
x3 = nextx(x)
run(`ssh computer2 julia compute_f.jl param $x2`) # computes f(x2) and saves y2 as ~/result2.dat
run(`ssh computer3 julia compute_f.jl param $x3`) # saves as ~/result3.dat
# wait...
while !isfile("~/result2.dat")
sleep(1)
println(".")
end
# as soon as the file ("~/result2.dat" appears, read result2.dat...
y2 = read("~/result2.dat")
y3 = read("~/result3.dat")
# find best value
y = min(y2,y3)
x = indmin(y2,y3)[x2,x3]
# test convergence
converged = conv(y,minf) # test convergence somehow...suppose its `false` for now :)
# delete files
rm("~/result2.dat")
rm("~/result3.dat")
end # while
end