Product distribution allocates (a lot)

Hi, I just found out that in the following example, sampling from a Distributions.Product distribution is much slower and allocates much more than implementing the sampler manually. Is there something that I’m overlooking here?

using Distributions, BenchmarkTools, Random
Random.seed!(42)

function rand_prod(d::Product, N::Int64)
    N_out = Matrix{Float64}(undef, N, length(d.v))
    for (i, dist) in enumerate(d.v)
        N_out[:, i] .= rand(dist, N)
    end
    return permutedims(N_out)
end

function run_tests(N::Int64)
    d = Product([Exponential(1.0), Normal(0.0, 1.0)])
    display(rand(d, 5))
    display(rand_prod(d, 5))
    display(@benchmark rand($d, $N))
    display(@benchmark rand_prod($d, $N))
    return nothing
end

run_tests(1_000_000)

gives:

2Γ—5 Matrix{Float64}:
  1.45158    0.550663   1.01842    2.90497    2.30997
 -0.879859  -0.733255  -0.0725819  0.631621  -0.35417
2Γ—5 Matrix{Float64}:
 1.37602    0.751383   0.299774  0.118754  0.611785
 0.129139  -0.294774  -0.374268  1.16951   0.256848
BenchmarkTools.Trial: 58 samples with 1 evaluation per sample.
 Range (min … max):  81.368 ms … 122.176 ms  β”Š GC (min … max): 0.38% … 31.60%
 Time  (median):     83.816 ms               β”Š GC (median):    0.98%
 Time  (mean Β± Οƒ):   87.151 ms Β±   8.514 ms  β”Š GC (mean Β± Οƒ):  5.13% Β±  7.52%

    β–ƒβ–„  β–ˆ                                                       
  β–†β–†β–ˆβ–ˆβ–ˆβ–‡β–ˆβ–…β–ƒβ–β–β–ƒβ–β–β–β–β–β–β–β–β–β–ƒβ–ƒβ–…β–ƒβ–β–β–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–β–ƒ ▁
  81.4 ms         Histogram: frequency by time          107 ms <

 Memory estimate: 76.29 MiB, allocs estimate: 3999491.
BenchmarkTools.Trial: 632 samples with 1 evaluation per sample.
 Range (min … max):  7.255 ms … 35.720 ms  β”Š GC (min … max): 0.00% … 79.11%
 Time  (median):     7.773 ms              β”Š GC (median):    7.24%
 Time  (mean Β± Οƒ):   7.907 ms Β±  1.473 ms  β”Š GC (mean Β± Οƒ):  9.40% Β±  4.88%

             β–ƒβ–‚ β–β–ƒβ–‚β–‚β–ˆβ–‚β–‚β–…β–                                     
  β–‚β–β–β–β–ƒβ–ƒβ–ƒβ–†β–†β–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–†β–„β–…β–†β–„β–„β–ƒβ–ƒβ–„β–ƒβ–ƒβ–„β–ƒβ–…β–ƒβ–„β–ƒβ–‚β–ƒβ–ƒβ–‚β–ƒβ–‚β–β–‚β–β–‚β–β–β–β–‚β–β–‚ β–„
  7.26 ms        Histogram: frequency by time        8.85 ms <

 Memory estimate: 45.78 MiB, allocs estimate: 16.

In the interest of full disclosure: in my application, I need to use permutedims(rand(..., N)) instead of rand(...,N), so rand_prod compares even better. I’m just concerned if I’m misusing Distributions.Product in such a way that it hinders more efficient sampling, or that it really is more efficient to implement the loops myself…

I’m also a bit surprised that the following function rand_prod_no_alloc seems to be slower than rand_prod, although allocating less:

function rand_prod_no_alloc(d::Product, N::Int64)
    N_out = Matrix{Float64}(undef, N, length(d.v))
    for (i, dist) in enumerate(d.v)
        rand!(dist, view(N_out, :, i))
    end
    return permutedims(N_out)
end

gives

BenchmarkTools.Trial: 473 samples with 1 evaluation per sample.
 Range (min … max):   9.830 ms … 38.517 ms  β”Š GC (min … max): 0.00% … 73.64%
 Time  (median):     10.502 ms              β”Š GC (median):    4.40%
 Time  (mean Β± Οƒ):   10.573 ms Β±  1.710 ms  β”Š GC (mean Β± Οƒ):  4.79% Β±  5.47%

     ▁▂ ▃▂▁                β–ˆβ–ƒ  ▁▂▃▆ ▄▃▁▄    β–‚                  
  β–†β–‡β–ˆβ–ˆβ–ˆβ–†β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–†β–†β–‡β–…β–„β–„β–„β–†β–ƒβ–…β–†β–…β–†β–ˆβ–ˆβ–ˆβ–ˆβ–†β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–‡β–ˆβ–ˆβ–‡β–‡β–…β–†β–…β–ƒβ–ƒβ–„β–ƒβ–„β–ƒβ–ƒβ–ƒβ–β–β–ƒ β–„
  9.83 ms         Histogram: frequency by time        11.3 ms <

 Memory estimate: 30.52 MiB, allocs estimate: 7.

I just read the documentation again, Distributions.Product is apparently deprecated as a constructor, but its recommended (see product distributions documentation) replacement, Distributions.product_distribution, has a similar problem:

BenchmarkTools.Trial: 62 samples with 1 evaluation per sample.
 Range (min … max):  74.642 ms … 111.025 ms  β”Š GC (min … max): 0.30% … 31.05%
 Time  (median):     76.733 ms               β”Š GC (median):    0.83%
 Time  (mean Β± Οƒ):   80.919 ms Β±   8.558 ms  β”Š GC (mean Β± Οƒ):  5.17% Β±  7.86%

   β–† β–†β–ˆβ–ƒ                                                        
  β–…β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–„β–β–‡β–β–β–„β–β–…β–β–β–„β–β–„β–…β–„β–„β–„β–„β–β–„β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–…β–ˆβ–„β–„ ▁
  74.6 ms         Histogram: frequency by time         99.8 ms <

 Memory estimate: 76.29 MiB, allocs estimate: 3999491.

I have the same experience, where I need high performance product distributions, I use this implementation

2 Likes

(post deleted by author)

The d generated by the Distributions package (d = Product([Exponential(1.0), Normal(0.0, 1.0)])) is type unstable, and it causes rand to be slow as well. It also causes your handwritten function rand_prod to be slowed down because the dist variable type is unstable. If possible, I suggest you switch to a different package with the same functionality, or implement this function entirely on your own. You’ll definitely get a huge speedup.

1 Like

The problem here is that you’ve got a Product of two different types of distributions. If you look at the type of the vector being passed into the constructor for Product, you’ll see the following:

julia> typeof([Exponential(1.0), Normal(0.0, 1.0)])
Vector{Distribution{Univariate, Continuous}} (alias for Array{Distribution{ArrayLikeVariate{0}, Continuous}, 1})

Replacing [Exponential(1.0), Normal(0.0, 1.0)] with [Normal(2.0, 1.0), Normal(0.0, 1.0)] in your example above yields the following results on my machine when I run your code:

BenchmarkTools.Trial: 616 samples with 1 evaluation per sample.
 Range (min … max):  7.258 ms …  10.340 ms  β”Š GC (min … max): 0.00% … 10.45%
 Time  (median):     8.138 ms               β”Š GC (median):    5.06%
 Time  (mean Β± Οƒ):   8.123 ms Β± 395.103 ΞΌs  β”Š GC (mean Β± Οƒ):  4.04% Β±  3.37%

            β–†β–‡β–„        β–ƒβ–ˆβ–‚ β–„β–‚β–ƒ                                 
  β–‚β–β–β–β–β–β–β–β–β–„β–ˆβ–ˆβ–ˆβ–†β–…β–ƒβ–„β–‚β–ƒβ–ƒβ–„β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–„β–ƒβ–ƒβ–β–ƒβ–β–‚β–‚β–‚β–ƒβ–β–β–‚β–‚β–‚β–β–‚β–β–ƒβ–‚β–β–‚β–ƒβ–‚β–‚β–‚β–β–‚β–ƒ β–ƒ
  7.26 ms         Histogram: frequency by time        9.54 ms <

 Memory estimate: 15.26 MiB, allocs estimate: 3.
BenchmarkTools.Trial: 622 samples with 1 evaluation per sample.
 Range (min … max):  6.856 ms …  10.599 ms  β”Š GC (min … max):  0.00% … 32.92%
 Time  (median):     7.948 ms               β”Š GC (median):    11.17%
 Time  (mean Β± Οƒ):   8.035 ms Β± 529.012 ΞΌs  β”Š GC (mean Β± Οƒ):  12.23% Β±  4.55%

            ▁▇▃▃▁▄▄ β–‚β–β–…β–…β–ˆβ–†β–„β–…β–ƒβ–„β–β–‚   ▁ ▁ ▁  ▁ ▁                  
  β–‚β–‚β–ƒβ–‚β–‚β–ƒβ–‚β–β–β–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–†β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–…β–ˆβ–‡β–ˆβ–„β–ˆβ–†β–ƒβ–ƒβ–ƒβ–‚β–„β–ƒβ–β–β–β–β–‚β–‚β–‚ β–…
  6.86 ms         Histogram: frequency by time        9.51 ms <

 Memory estimate: 45.78 MiB, allocs estimate: 12.

i.e. it continues to take very slightly longer to run, but is almost as fast.

Now, clearly, this isn’t the use-case you have in mind (I’m assuming that you actually really do want to sample from this pair of distributions). I don’t know exactly what your use-case is, but you could sample the same set of random numbers doing something like

julia> N = 1_000_000;

julia> @benchmark vcat(rand(Exponential(1.0), $N)', rand(Normal(0.0, 1.0), $N)')
BenchmarkTools.Trial: 729 samples with 1 evaluation per sample.
 Range (min … max):  5.989 ms …   9.828 ms  β”Š GC (min … max):  0.00% … 34.27%
 Time  (median):     6.642 ms               β”Š GC (median):    10.00%
 Time  (mean Β± Οƒ):   6.855 ms Β± 450.783 ΞΌs  β”Š GC (mean Β± Οƒ):  12.89% Β±  5.28%

                 β–‡β–‡β–ˆβ–…β–ƒ                                         
  β–‚β–ƒβ–ƒβ–‚β–ƒβ–β–‚β–β–‚β–β–‚β–β–β–‚β–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–…β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–„β–ƒβ–‚β–ƒβ–ƒβ–‚β–‚β–ƒβ–…β–†β–…β–…β–„β–ƒβ–‚β–‚β–ƒβ–„β–ƒβ–…β–„β–„β–‚β–ƒβ–ƒβ–‚β–‚β–‚β–β–β–‚ β–ƒ
  5.99 ms         Histogram: frequency by time           8 ms <

 Memory estimate: 30.52 MiB, allocs estimate: 9.

which seems to give reasonable performance.

I do basically agree with what others have said though: if you really do need a distribution over a pair of distributions whose types are different, product_distribution probably isn’t the tool for you.

edit: having said all of this, your implementation of rand for Product distributions might be a better choice in general than the one that is currently in Distributions.jl. I’m sure the maintainers would appreciate you filing an issue highlighting this performance problem, and pointing out that your implementation of rand_prod seems to yield quite substantial improvements.

1 Like

Thank you all for your insights, that helps a lot! My use-case indeed involves a product of various (different) distributions, so I’ll probably just iterate over them manually!

1 Like

I’ll file an issue later today, thank you for pointing this out! (Getting rid of the type instability would also be nice…)

1 Like

This is now Sampling from Distributions.product_distribution allocates (a lot) Β· Issue #1954 Β· JuliaStats/Distributions.jl Β· GitHub

1 Like

There is also this old issue

2 Likes