Hi everyone,

I am trying to improve a piece of code of mine (numerical simulations for a scientific research project) to scale it up.

It essentially involves running one computationally and memory heavy function for several trials and then reducing across trials. A pseudocode example:

```
using Distributed
addprocs()
@everywhere hard_function(x) # allocates big matrices, does stuff with them and returns an array
results = @distributed (+) for _ = 1:K # we don't want the individual results, just their sum
hard_function(x)
end
```

This was my initial code, which was a bit slow, so I thought the fact that I am allocating big matrices (e.g. 100 \times 10^6) for every one of the K trials would be a problem. So I decided to try something else:

```
@everywhere matrices = preallocate(matrices)
@everywhere hard_function!(matrices, x) # defines same function, but using preallocated matrices
results = @distributed (+) for _ = 1:K
hard_function!(matrices, x)
end
```

My reasoning being that in this way I would allocate things only once, for every worker.

It turns out it made things slower and allocates more memory (according to `@time`

)

I tried looking for other sources to this discrepancy, but couldnâ€™t find an explanation so far, so maybe thought Iâ€™d ask here and check if Iâ€™m not trying anything stupid or if my original idea was wrong from the start.

Thanks a lot!

Lucas