Here is the best I see with large tuples
using BenchmarkTools
using Distributed
using SharedArrays
addprocs(length(Sys.cpu_info())÷2-1)
@everywhere using Random
@everywhere N = 100
@everywhere function foo(i)
Random.seed!(i)
NTuple{N*N,Float64}(rand(N,N))
end
n = 10
@btime (array = pmap(1:n) do i
foo(i)
end)
println()
yielding
12.775 ms (200650 allocations: 8.42 MiB)
and that is not including considerable compilation time as you already noted. Regarding this see for example Correct way to dereference large memory? - #4 by jakobnissen