Product distribution allocates (a lot)

JADekker · March 10, 2025, 3:38pm

Hi, I just found out that in the following example, sampling from a Distributions.Product distribution is much slower and allocates much more than implementing the sampler manually. Is there something that I’m overlooking here?

using Distributions, BenchmarkTools, Random
Random.seed!(42)

function rand_prod(d::Product, N::Int64)
    N_out = Matrix{Float64}(undef, N, length(d.v))
    for (i, dist) in enumerate(d.v)
        N_out[:, i] .= rand(dist, N)
    end
    return permutedims(N_out)
end

function run_tests(N::Int64)
    d = Product([Exponential(1.0), Normal(0.0, 1.0)])
    display(rand(d, 5))
    display(rand_prod(d, 5))
    display(@benchmark rand($d, $N))
    display(@benchmark rand_prod($d, $N))
    return nothing
end

run_tests(1_000_000)

gives:

2×5 Matrix{Float64}:
  1.45158    0.550663   1.01842    2.90497    2.30997
 -0.879859  -0.733255  -0.0725819  0.631621  -0.35417
2×5 Matrix{Float64}:
 1.37602    0.751383   0.299774  0.118754  0.611785
 0.129139  -0.294774  -0.374268  1.16951   0.256848
BenchmarkTools.Trial: 58 samples with 1 evaluation per sample.
 Range (min … max):  81.368 ms … 122.176 ms  ┊ GC (min … max): 0.38% … 31.60%
 Time  (median):     83.816 ms               ┊ GC (median):    0.98%
 Time  (mean ± σ):   87.151 ms ±   8.514 ms  ┊ GC (mean ± σ):  5.13% ±  7.52%

    ▃▄  █                                                       
  ▆▆███▇█▅▃▁▁▃▁▁▁▁▁▁▁▁▁▃▃▅▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▃▃▃▁▃ ▁
  81.4 ms         Histogram: frequency by time          107 ms <

 Memory estimate: 76.29 MiB, allocs estimate: 3999491.
BenchmarkTools.Trial: 632 samples with 1 evaluation per sample.
 Range (min … max):  7.255 ms … 35.720 ms  ┊ GC (min … max): 0.00% … 79.11%
 Time  (median):     7.773 ms              ┊ GC (median):    7.24%
 Time  (mean ± σ):   7.907 ms ±  1.473 ms  ┊ GC (mean ± σ):  9.40% ±  4.88%

             ▃▂ ▁▃▂▂█▂▂▅▁                                     
  ▂▁▁▁▃▃▃▆▆▇▇█████████████▆▆▄▅▆▄▄▃▃▄▃▃▄▃▅▃▄▃▂▃▃▂▃▂▁▂▁▂▁▁▁▂▁▂ ▄
  7.26 ms        Histogram: frequency by time        8.85 ms <

 Memory estimate: 45.78 MiB, allocs estimate: 16.

In the interest of full disclosure: in my application, I need to use permutedims(rand(..., N)) instead of rand(...,N), so rand_prod compares even better. I’m just concerned if I’m misusing Distributions.Product in such a way that it hinders more efficient sampling, or that it really is more efficient to implement the loops myself…

JADekker · March 10, 2025, 3:42pm

I’m also a bit surprised that the following function rand_prod_no_alloc seems to be slower than rand_prod, although allocating less:

function rand_prod_no_alloc(d::Product, N::Int64)
    N_out = Matrix{Float64}(undef, N, length(d.v))
    for (i, dist) in enumerate(d.v)
        rand!(dist, view(N_out, :, i))
    end
    return permutedims(N_out)
end

gives

BenchmarkTools.Trial: 473 samples with 1 evaluation per sample.
 Range (min … max):   9.830 ms … 38.517 ms  ┊ GC (min … max): 0.00% … 73.64%
 Time  (median):     10.502 ms              ┊ GC (median):    4.40%
 Time  (mean ± σ):   10.573 ms ±  1.710 ms  ┊ GC (mean ± σ):  4.79% ±  5.47%

     ▁▂ ▃▂▁                █▃  ▁▂▃▆ ▄▃▁▄    ▂                  
  ▆▇███▆████▇▆▆▇▅▄▄▄▆▃▅▆▅▆████▆████▇████▇█▇██▇▇▅▆▅▃▃▄▃▄▃▃▃▁▁▃ ▄
  9.83 ms         Histogram: frequency by time        11.3 ms <

 Memory estimate: 30.52 MiB, allocs estimate: 7.

JADekker · March 11, 2025, 7:41am

I just read the documentation again, Distributions.Product is apparently deprecated as a constructor, but its recommended (see product distributions documentation) replacement, Distributions.product_distribution, has a similar problem:

BenchmarkTools.Trial: 62 samples with 1 evaluation per sample.
 Range (min … max):  74.642 ms … 111.025 ms  ┊ GC (min … max): 0.30% … 31.05%
 Time  (median):     76.733 ms               ┊ GC (median):    0.83%
 Time  (mean ± σ):   80.919 ms ±   8.558 ms  ┊ GC (mean ± σ):  5.17% ±  7.86%

   ▆ ▆█▃                                                        
  ▅█████▄▁▇▁▁▄▁▅▁▁▄▁▄▅▄▄▄▄▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅█▄▄ ▁
  74.6 ms         Histogram: frequency by time         99.8 ms <

 Memory estimate: 76.29 MiB, allocs estimate: 3999491.

baggepinnen · March 11, 2025, 9:33am

I have the same experience, where I need high performance product distributions, I use this implementation

karei · March 11, 2025, 10:26am

The d generated by the Distributions package (d = Product([Exponential(1.0), Normal(0.0, 1.0)])) is type unstable, and it causes rand to be slow as well. It also causes your handwritten function rand_prod to be slowed down because the dist variable type is unstable. If possible, I suggest you switch to a different package with the same functionality, or implement this function entirely on your own. You’ll definitely get a huge speedup.

willtebbutt · March 11, 2025, 10:56am

The problem here is that you’ve got a Product of two different types of distributions. If you look at the type of the vector being passed into the constructor for Product, you’ll see the following:

julia> typeof([Exponential(1.0), Normal(0.0, 1.0)])
Vector{Distribution{Univariate, Continuous}} (alias for Array{Distribution{ArrayLikeVariate{0}, Continuous}, 1})

Replacing [Exponential(1.0), Normal(0.0, 1.0)] with [Normal(2.0, 1.0), Normal(0.0, 1.0)] in your example above yields the following results on my machine when I run your code:

BenchmarkTools.Trial: 616 samples with 1 evaluation per sample.
 Range (min … max):  7.258 ms …  10.340 ms  ┊ GC (min … max): 0.00% … 10.45%
 Time  (median):     8.138 ms               ┊ GC (median):    5.06%
 Time  (mean ± σ):   8.123 ms ± 395.103 μs  ┊ GC (mean ± σ):  4.04% ±  3.37%

            ▆▇▄        ▃█▂ ▄▂▃                                 
  ▂▁▁▁▁▁▁▁▁▄███▆▅▃▄▂▃▃▄████████▇▄▃▃▁▃▁▂▂▂▃▁▁▂▂▂▁▂▁▃▂▁▂▃▂▂▂▁▂▃ ▃
  7.26 ms         Histogram: frequency by time        9.54 ms <

 Memory estimate: 15.26 MiB, allocs estimate: 3.
BenchmarkTools.Trial: 622 samples with 1 evaluation per sample.
 Range (min … max):  6.856 ms …  10.599 ms  ┊ GC (min … max):  0.00% … 32.92%
 Time  (median):     7.948 ms               ┊ GC (median):    11.17%
 Time  (mean ± σ):   8.035 ms ± 529.012 μs  ┊ GC (mean ± σ):  12.23% ±  4.55%

            ▁▇▃▃▁▄▄ ▂▁▅▅█▆▄▅▃▄▁▂   ▁ ▁ ▁  ▁ ▁                  
  ▂▂▃▂▂▃▂▁▁▆███████▇████████████▇▆▇█████▇▅█▇█▄█▆▃▃▃▂▄▃▁▁▁▁▂▂▂ ▅
  6.86 ms         Histogram: frequency by time        9.51 ms <

 Memory estimate: 45.78 MiB, allocs estimate: 12.

i.e. it continues to take very slightly longer to run, but is almost as fast.

Now, clearly, this isn’t the use-case you have in mind (I’m assuming that you actually really do want to sample from this pair of distributions). I don’t know exactly what your use-case is, but you could sample the same set of random numbers doing something like

julia> N = 1_000_000;

julia> @benchmark vcat(rand(Exponential(1.0), $N)', rand(Normal(0.0, 1.0), $N)')
BenchmarkTools.Trial: 729 samples with 1 evaluation per sample.
 Range (min … max):  5.989 ms …   9.828 ms  ┊ GC (min … max):  0.00% … 34.27%
 Time  (median):     6.642 ms               ┊ GC (median):    10.00%
 Time  (mean ± σ):   6.855 ms ± 450.783 μs  ┊ GC (mean ± σ):  12.89% ±  5.28%

                 ▇▇█▅▃                                         
  ▂▃▃▂▃▁▂▁▂▁▂▁▁▂▆█████▇▅▄▄▃▃▃▃▄▃▂▃▃▂▂▃▅▆▅▅▄▃▂▂▃▄▃▅▄▄▂▃▃▂▂▂▁▁▂ ▃
  5.99 ms         Histogram: frequency by time           8 ms <

 Memory estimate: 30.52 MiB, allocs estimate: 9.

which seems to give reasonable performance.

I do basically agree with what others have said though: if you really do need a distribution over a pair of distributions whose types are different, product_distribution probably isn’t the tool for you.

edit: having said all of this, your implementation of rand for Product distributions might be a better choice in general than the one that is currently in Distributions.jl. I’m sure the maintainers would appreciate you filing an issue highlighting this performance problem, and pointing out that your implementation of rand_prod seems to yield quite substantial improvements.

JADekker · March 11, 2025, 11:02am

Thank you all for your insights, that helps a lot! My use-case indeed involves a product of various (different) distributions, so I’ll probably just iterate over them manually!

JADekker · March 11, 2025, 11:03am

I’ll file an issue later today, thank you for pointing this out! (Getting rid of the type instability would also be nice…)

JADekker · March 11, 2025, 1:10pm

This is now Sampling from Distributions.product_distribution allocates (a lot) · Issue #1954 · JuliaStats/Distributions.jl · GitHub

baggepinnen · March 11, 2025, 3:12pm

There is also this old issue

github.com/JuliaStats/Distributions.jl

Mixed <: ValueSupport and Product with tuple storage

opened 06:40AM - 21 May 19 UTC

baggepinnen

I needed a `Product` distribution with mixed continuous/discrete support for a p…roject and thus implemented the types ```julia struct Mixed <: ValueSupport TupleProduct <: MultivariateDistribution ``` Would you be interested in a PR with (parts of) this implementation, possibly with a different name? Some code and benchmarks below ```julia julia> dt = TupleProduct((Normal(0,2), Normal(0,2), Binomial())) # Mixed value support TupleProduct{3,LowLevelParticleFilters.Mixed,Tuple{Normal{Float64},Normal{Float64},Binomial{Float64}}}( v: (Normal{Float64}(μ=0.0, σ=2.0), Normal{Float64}(μ=0.0, σ=2.0), Binomial{Float64}(n=1, p=0.5)) ) ``` ### A small benchmark The package where I've implemented this is called `LowLevelPartricleFilters`, which also defines some methods for distributions and static arrays. If this package is loaded we have the following timings ```julia using BenchmarkTools, Distributions, LowLevelParticleFilters sv = @SVector randn(2) d = Product([Normal(0,2), Normal(0,2)]) dt = TupleProduct((Normal(0,2), Normal(0,2))) dm = MvNormal(2, 2) @btime logpdf($d,$(Vector(sv))) # 32.449 ns (1 allocation: 32 bytes) @btime logpdf($dt,$(Vector(sv))) # 21.141 ns (0 allocations: 0 bytes) @btime logpdf($dm,$(Vector(sv))) # 48.745 ns (1 allocation: 96 bytes) @btime logpdf($d,$sv) # 22.651 ns (0 allocations: 0 bytes) @btime logpdf($dt,$sv) # 0.021 ns (0 allocations: 0 bytes) @btime logpdf($dm,$sv) # 0.021 ns (0 allocations: 0 bytes) ``` If `LowLevelPartricleFilters` and the special static methods are not loaded, we have identical timings for `SVector` and `Vector` ```julia @btime logpdf($d,$sv) # 32.621 ns (1 allocation: 32 bytes) @btime logpdf($dm,$sv) # 46.415 ns (1 allocation: 96 bytes) ``` ### Implementation Key points, the rest of the implementation is [here](https://github.com/baggepinnen/LowLevelParticleFilters.jl/blob/c23bbad5d5822601fb9f1cf16c4a7a236a85de42/src/utils.jl#L119) ```julia struct Mixed <: ValueSupport end struct TupleProduct{N,S,V<:NTuple{N,UnivariateDistribution}} <: MultivariateDistribution{S} v::V function TupleProduct(v::V) where {N,V<:NTuple{N,UnivariateDistribution}} all(Distributions.value_support(typeof(d)) == Discrete for d in v) && return new{N,Discrete,V}(v) all(Distributions.value_support(typeof(d)) == Continuous for d in v) && return new{N,Continuous,V}(v) return new{N,Mixed,V}(v) end end @generated function Distributions._logpdf(d::TupleProduct{N}, x::AbstractVector{<:Real}) where N :(Base.Cartesian.@ncall $N Base.:+ i->logpdf(d.v[i], x[i])) end ```

Topic		Replies	Views
Random variables in Julia (working list) Statistics distributions	36	6105	November 27, 2022
Optimization question New to Julia	31	491	October 1, 2024
Required methods to sample a multivariate distribution Statistics distributions	1	297	October 26, 2023
Why do some functions not output the result directly? New to Julia iterators , lazy-evaluation	23	1203	May 4, 2022
Is allocation inevitable when generating random numbers from a categorical distribution? General Usage question	9	499	July 18, 2023

Product distribution allocates (a lot)

Related topics