In my code I have a rather simple architecture, Master process send some calculation to a worker, the calculation is done on a DArray
, and the master collects it using fetch
.
However, I have a big unexplained gap between the actual time of the calculation, and the time it takes to fetch the results. Also, this gap increases relatively to the calculation time, even thou the result size does not vary.
This is a snippet of the master process code:
stime = time()
for i in workers()
workers_suff_dict[i] = @spawnat i create_suff_stats_dict_worker(group.model_hyperparams.distribution_hyper_params,
indices)
end
println("start workers:")
println(time() - stime)
stime = time()
workers_suff_dict_fetched = Dict([k=>fetch(v) for (k,v) in workers_suff_dict])
println("fetch:")
println(time() - stime)
println("fetched size:")
println([Base.summarysize(v) for (k,v) in workers_suff_dict_fetched])
The following is the worker process code:
function create_suff_stats_dict_worker(hyper_params, indices)
stime = time()
suff_stats_dict = Dict()
if indices == nothing
indices = collect(1:length(clusters_vector))
end
global group_points
global group_labels
global group_sublabels
points = group_points
labels = group_labels
sublabels = group_sublabels
dim = size(points,1)
for index in indices
cpl_suff = create_sufficient_statistics_params(hyper_params,hyper_params, points[:,(sublabels .== 1) .& (labels.== index)])
cpr_suff = create_sufficient_statistics_params(hyper_params,hyper_params, points[:,(sublabels .== 2) .& (labels.== index)])
suff_stats_dict[index] = (cpl_suff,cpr_suff)
end
println("worker time:")
println(time()- stime)
return suff_stats_dict
end
This is an example print:
start workers:
0.06580090522766113
From worker 2: worker time:
From worker 2: 0.5108780860900879
fetch:
7.827517032623291
fetched size:
[534792]
In the print above the fetched info is 16x ‘64X64’ float matrices.
@time fetch(@spawnat 2 rand(16,64,64))
, which is around the same size, takes 0.023883 seconds.
As mentioned above, the fetch time is actually relative to the calculation time of the worker, and not the size of the fetched item.
In addition, I verified that the calculations take the time the worker claims.
Could use any pointers on this.
Thanks,