I have a function that computes the mean of each column of a matrix 100 times. However, the serial version is as fast as the parallel version on my machine (3.5 GHz Intel Core i5 with 4 cores)

```
addprocs(3)
@everywhere function repeated_mean(x)
for _ in 1:100
mean(x)
end
return x
end
function f(X)
@time map(repeated_mean, X)
@time pmap(repeated_mean, X)
return nothing
end
f([rand(10_000_000) for j in 1:20])
#> 8.432926 seconds (2 allocations: 256 bytes)
#> 9.528860 seconds (377.27 k allocations: 1.524 GiB, 8.65% gc time)
```

What is happening? I naively expected the parallel version to be 4 times faster. Is there a better way to exploit multicores in this kind of computation?