Product distribution allocates (a lot)

I’m also a bit surprised that the following function rand_prod_no_alloc seems to be slower than rand_prod, although allocating less:

function rand_prod_no_alloc(d::Product, N::Int64)
    N_out = Matrix{Float64}(undef, N, length(d.v))
    for (i, dist) in enumerate(d.v)
        rand!(dist, view(N_out, :, i))
    end
    return permutedims(N_out)
end

gives

BenchmarkTools.Trial: 473 samples with 1 evaluation per sample.
 Range (min … max):   9.830 ms … 38.517 ms  ┊ GC (min … max): 0.00% … 73.64%
 Time  (median):     10.502 ms              ┊ GC (median):    4.40%
 Time  (mean ± σ):   10.573 ms ±  1.710 ms  ┊ GC (mean ± σ):  4.79% ±  5.47%

     ▁▂ ▃▂▁                █▃  ▁▂▃▆ ▄▃▁▄    ▂                  
  ▆▇███▆████▇▆▆▇▅▄▄▄▆▃▅▆▅▆████▆████▇████▇█▇██▇▇▅▆▅▃▃▄▃▄▃▃▃▁▁▃ ▄
  9.83 ms         Histogram: frequency by time        11.3 ms <

 Memory estimate: 30.52 MiB, allocs estimate: 7.