Hi, I just found out that in the following example, sampling from a Distributions.Product distribution is much slower and allocates much more than implementing the sampler manually. Is there something that Iβm overlooking here?
using Distributions, BenchmarkTools, Random
Random.seed!(42)
function rand_prod(d::Product, N::Int64)
N_out = Matrix{Float64}(undef, N, length(d.v))
for (i, dist) in enumerate(d.v)
N_out[:, i] .= rand(dist, N)
end
return permutedims(N_out)
end
function run_tests(N::Int64)
d = Product([Exponential(1.0), Normal(0.0, 1.0)])
display(rand(d, 5))
display(rand_prod(d, 5))
display(@benchmark rand($d, $N))
display(@benchmark rand_prod($d, $N))
return nothing
end
run_tests(1_000_000)
gives:
2Γ5 Matrix{Float64}:
1.45158 0.550663 1.01842 2.90497 2.30997
-0.879859 -0.733255 -0.0725819 0.631621 -0.35417
2Γ5 Matrix{Float64}:
1.37602 0.751383 0.299774 0.118754 0.611785
0.129139 -0.294774 -0.374268 1.16951 0.256848
BenchmarkTools.Trial: 58 samples with 1 evaluation per sample.
Range (min β¦ max): 81.368 ms β¦ 122.176 ms β GC (min β¦ max): 0.38% β¦ 31.60%
Time (median): 83.816 ms β GC (median): 0.98%
Time (mean Β± Ο): 87.151 ms Β± 8.514 ms β GC (mean Β± Ο): 5.13% Β± 7.52%
ββ β
ββββββββ
ββββββββββββββββ
ββββββββββββββββββββββββββββββββββββ β
81.4 ms Histogram: frequency by time 107 ms <
Memory estimate: 76.29 MiB, allocs estimate: 3999491.
BenchmarkTools.Trial: 632 samples with 1 evaluation per sample.
Range (min β¦ max): 7.255 ms β¦ 35.720 ms β GC (min β¦ max): 0.00% β¦ 79.11%
Time (median): 7.773 ms β GC (median): 7.24%
Time (mean Β± Ο): 7.907 ms Β± 1.473 ms β GC (mean Β± Ο): 9.40% Β± 4.88%
ββ ββββββββ
β
ββββββββββββββββββββββββββββ
βββββββββββ
βββββββββββββββββββ β
7.26 ms Histogram: frequency by time 8.85 ms <
Memory estimate: 45.78 MiB, allocs estimate: 16.
In the interest of full disclosure: in my application, I need to use permutedims(rand(..., N))
instead of rand(...,N)
, so rand_prod
compares even better. Iβm just concerned if Iβm misusing Distributions.Product
in such a way that it hinders more efficient sampling, or that it really is more efficient to implement the loops myselfβ¦