Unexplained gap between calculation, and fetch time

In my code I have a rather simple architecture, Master process send some calculation to a worker, the calculation is done on a DArray, and the master collects it using fetch.

However, I have a big unexplained gap between the actual time of the calculation, and the time it takes to fetch the results. Also, this gap increases relatively to the calculation time, even thou the result size does not vary.

This is a snippet of the master process code:

stime = time()
    for i in workers()
        workers_suff_dict[i] = @spawnat i create_suff_stats_dict_worker(group.model_hyperparams.distribution_hyper_params,
            indices)
    end
    println("start workers:")
    println(time() - stime)
    stime = time()
    workers_suff_dict_fetched = Dict([k=>fetch(v) for (k,v) in workers_suff_dict])
    println("fetch:")
    println(time() - stime)
    println("fetched size:")
    println([Base.summarysize(v) for (k,v) in workers_suff_dict_fetched])

The following is the worker process code:

function create_suff_stats_dict_worker(hyper_params, indices)
    stime = time()
    suff_stats_dict = Dict()
    if indices == nothing
        indices = collect(1:length(clusters_vector))
    end
    global group_points
    global group_labels
    global group_sublabels
    points = group_points
    labels = group_labels
    sublabels = group_sublabels
    dim = size(points,1)

    for index in indices
        cpl_suff = create_sufficient_statistics_params(hyper_params,hyper_params, points[:,(sublabels .== 1) .& (labels.== index)])
        cpr_suff = create_sufficient_statistics_params(hyper_params,hyper_params, points[:,(sublabels .== 2) .& (labels.== index)])
        suff_stats_dict[index] = (cpl_suff,cpr_suff)
    end
    println("worker time:")
    println(time()- stime)
    return suff_stats_dict
end

This is an example print:

start workers:
0.06580090522766113
      From worker 2:    worker time:
      From worker 2:    0.5108780860900879
fetch:
7.827517032623291
fetched size:
[534792]

In the print above the fetched info is 16x ‘64X64’ float matrices.
@time fetch(@spawnat 2 rand(16,64,64)) , which is around the same size, takes 0.023883 seconds.

As mentioned above, the fetch time is actually relative to the calculation time of the worker, and not the size of the fetched item.

In addition, I verified that the calculations take the time the worker claims.

Could use any pointers on this.

Thanks,