Improving Distributions.jl type consistency

cscherrer · October 19, 2019, 2:24pm

The rand(dist, n::Int) method is really handy, but there’s a lot of irregularity in the type that’s returned. Ideally we’d have some type A with

typeof(rand(dist, n::Int)) == A{typeof(rand(dist)), n}

Having an identity like this at the type level can allow us to build very general code, rather than relying no special cases. The obvious choice here is A == Array, but this loses some efficiency; for example, a regular array-of-vectors is more efficiently stored as a Matrix.

Currently, the relation is not a type constructor, but a function, something like

typeof(rand(dist, n::Int)) == f(typeof(rand(dist)), n)

f(Array{T, n}) = Array{T, n+1}
f(T) = Vector{T}

RecursiveArrayTools.jl gives a way to treat, for example, a vector-of-vectors as a matrix. It seems in the above case we need something to go the other direction, allowing internal representation as a regular array, but indexing as nested arrays.

Or maybe this is entirely wrong, and we need a different approach. Any ideas?

johnczito · October 21, 2019, 5:50am

there’s a lot of irregularity in the type that’s returned

I can think of the fact that rand(d::UnivariateDistribution, n::Int) returns a vector and rand(d::MultivariateDistribution, n::Int) returns a matrix, but rand(d::MatrixDistribution, n::Int) returns a vector of matrices. What else do you have in mind?

(This came up here, at least.)

cscherrer · October 21, 2019, 1:49pm

Hi @johnczito, welcome!

The problem is the lack of a consistent relationship between typeof(rand(dist)) and typeof(rand(dist, n::Int)). Starting with dist = Normal(), we have

julia> rand(Normal())
1.0200255085620293

julia> rand(Normal()) |> typeof
Float64

julia> rand(Normal(),2)
2-element Array{Float64,1}:
 -0.5708464683263881
 -0.5539728783723908

From this, one might expect that an extra Int argument to rand always changes the type from T to Array{T,1}. But that’s not it at all:

julia> rand(MvNormal(zeros(3)))
3-element Array{Float64,1}:
 -0.0
  0.0
  0.0

julia> rand(MvNormal(zeros(3)),2)
3×2 Array{Float64,2}:
 -0.0  0.0
 -0.0  0.0
  0.0  0.0

This might not seem like a big deal. After all, it’s easy enough to remember to treat these cases separately. The pain point comes when trying to write code that deals with an arbitrary distribution. As it is, generic functions on distributions behave strangely:

julia> samplemean(dist,n) = mean(rand(dist,n))
samplemean (generic function with 1 method)

julia> samplemean(Normal(),100)
0.13491200393928363

julia> samplemean(MvNormal(zeros(3)),100)
0.0

Topic		Replies	Views
Required methods to sample a multivariate distribution Statistics distributions	1	298	October 26, 2023
Type instability of discrete mixture distribution General Usage distributions , type-stability	15	248	June 5, 2024
Rand and Sampler interface for multiple samples General Usage distributions , random	2	552	August 21, 2021
Overriding Rand.rand for vectorization General Usage	5	155	August 30, 2023
Sampling Multiple samples from NormalInverseGamma distribution Statistics	2	675	April 27, 2018

Improving Distributions.jl type consistency

Related topics