Pmap extremely slow when function returns large object

Here is the best I see with large tuples

using BenchmarkTools
using Distributed
using SharedArrays

addprocs(length(Sys.cpu_info())÷2-1)

@everywhere using Random
@everywhere N = 100
@everywhere function foo(i)
	Random.seed!(i)
	NTuple{N*N,Float64}(rand(N,N))
end

n = 10

@btime (array = pmap(1:n) do i
    foo(i)
end)
println()

yielding

  12.775 ms (200650 allocations: 8.42 MiB)

and that is not including considerable compilation time as you already noted. Regarding this see for example Correct way to dereference large memory? - #4 by jakobnissen