In my code I have a rather simple architecture, Master process send some calculation to a worker, the calculation is done on a
DArray, and the master collects it using
However, I have a big unexplained gap between the actual time of the calculation, and the time it takes to fetch the results. Also, this gap increases relatively to the calculation time, even thou the result size does not vary.
This is a snippet of the master process code:
stime = time() for i in workers() workers_suff_dict[i] = @spawnat i create_suff_stats_dict_worker(group.model_hyperparams.distribution_hyper_params, indices) end println("start workers:") println(time() - stime) stime = time() workers_suff_dict_fetched = Dict([k=>fetch(v) for (k,v) in workers_suff_dict]) println("fetch:") println(time() - stime) println("fetched size:") println([Base.summarysize(v) for (k,v) in workers_suff_dict_fetched])
The following is the worker process code:
function create_suff_stats_dict_worker(hyper_params, indices) stime = time() suff_stats_dict = Dict() if indices == nothing indices = collect(1:length(clusters_vector)) end global group_points global group_labels global group_sublabels points = group_points labels = group_labels sublabels = group_sublabels dim = size(points,1) for index in indices cpl_suff = create_sufficient_statistics_params(hyper_params,hyper_params, points[:,(sublabels .== 1) .& (labels.== index)]) cpr_suff = create_sufficient_statistics_params(hyper_params,hyper_params, points[:,(sublabels .== 2) .& (labels.== index)]) suff_stats_dict[index] = (cpl_suff,cpr_suff) end println("worker time:") println(time()- stime) return suff_stats_dict end
This is an example print:
start workers: 0.06580090522766113 From worker 2: worker time: From worker 2: 0.5108780860900879 fetch: 7.827517032623291 fetched size: 
In the print above the fetched info is 16x ‘64X64’ float matrices.
@time fetch(@spawnat 2 rand(16,64,64)) , which is around the same size, takes 0.023883 seconds.
As mentioned above, the fetch time is actually relative to the calculation time of the worker, and not the size of the fetched item.
In addition, I verified that the calculations take the time the worker claims.
Could use any pointers on this.