Unexplained gap between calculation, and fetch time

In my code I have a rather simple architecture, Master process send some calculation to a worker, the calculation is done on a DArray, and the master collects it using fetch.

However, I have a big unexplained gap between the actual time of the calculation, and the time it takes to fetch the results. Also, this gap increases relatively to the calculation time, even thou the result size does not vary.

This is a snippet of the master process code:

stime = time()
    for i in workers()
        workers_suff_dict[i] = @spawnat i create_suff_stats_dict_worker(group.model_hyperparams.distribution_hyper_params,
    println("start workers:")
    println(time() - stime)
    stime = time()
    workers_suff_dict_fetched = Dict([k=>fetch(v) for (k,v) in workers_suff_dict])
    println(time() - stime)
    println("fetched size:")
    println([Base.summarysize(v) for (k,v) in workers_suff_dict_fetched])

The following is the worker process code:

function create_suff_stats_dict_worker(hyper_params, indices)
    stime = time()
    suff_stats_dict = Dict()
    if indices == nothing
        indices = collect(1:length(clusters_vector))
    global group_points
    global group_labels
    global group_sublabels
    points = group_points
    labels = group_labels
    sublabels = group_sublabels
    dim = size(points,1)

    for index in indices
        cpl_suff = create_sufficient_statistics_params(hyper_params,hyper_params, points[:,(sublabels .== 1) .& (labels.== index)])
        cpr_suff = create_sufficient_statistics_params(hyper_params,hyper_params, points[:,(sublabels .== 2) .& (labels.== index)])
        suff_stats_dict[index] = (cpl_suff,cpr_suff)
    println("worker time:")
    println(time()- stime)
    return suff_stats_dict

This is an example print:

start workers:
      From worker 2:    worker time:
      From worker 2:    0.5108780860900879
fetched size:

In the print above the fetched info is 16x ‘64X64’ float matrices.
@time fetch(@spawnat 2 rand(16,64,64)) , which is around the same size, takes 0.023883 seconds.

As mentioned above, the fetch time is actually relative to the calculation time of the worker, and not the size of the fetched item.

In addition, I verified that the calculations take the time the worker claims.

Could use any pointers on this.