I updated Julia, and implemented it in parallel. It is fast now, and a bit faster than R. I do not think implementing it in parallel in R would have much gain, because of the initial overhead.
Julia
using Distributions; using LinearAlgebra;
A = zeros(20,20,50000)
function parallel_test(nu)
inv_wish = rand(Wishart(nu, Matrix{Float64}(I,nu,nu)))
end
@time Threads.@threads for i = 1:50000
A[:,:,i] = parallel_test(20);
end;
1.043198 seconds (3.16 M allocations: 968.912 MiB, 8.96% gc time)
Now R:
system.time(replicate(50000, rWishart(1, 20, diag(20)) ))
user system elapsed
1.13 0.16 1.2
I also checked with out storing the matrices in Julia, and here it was often less than 1 second.