Hi everyone,

I am trying to improve a piece of code of mine (numerical simulations for a scientific research project) to scale it up.

It essentially involves running one computationally and memory heavy function for several trials and then reducing across trials. A pseudocode example:

```
using Distributed
addprocs()
@everywhere hard_function(x) # allocates big matrices, does stuff with them and returns an array
results = @distributed (+) for _ = 1:K # we don't want the individual results, just their sum
hard_function(x)
end
```

This was my initial code, which was a bit slow, so I thought the fact that I am allocating big matrices (e.g. 100 \times 10^6) for every one of the K trials would be a problem. So I decided to try something else:

```
@everywhere matrices = preallocate(matrices)
@everywhere hard_function!(matrices, x) # defines same function, but using preallocated matrices
results = @distributed (+) for _ = 1:K
hard_function!(matrices, x)
end
```

My reasoning being that in this way I would allocate things only once, for every worker.

It turns out it made things slower and allocates more memory (according to `@time`

)

I tried looking for other sources to this discrepancy, but couldn’t find an explanation so far, so maybe thought I’d ask here and check if I’m not trying anything stupid or if my original idea was wrong from the start.

Thanks a lot!

Lucas