# Improving Distributions.jl type consistency

The `rand(dist, n::Int)` method is really handy, but there’s a lot of irregularity in the type that’s returned. Ideally we’d have some type `A` with

``````typeof(rand(dist, n::Int)) == A{typeof(rand(dist)), n}
``````

Having an identity like this at the type level can allow us to build very general code, rather than relying no special cases. The obvious choice here is `A == Array`, but this loses some efficiency; for example, a regular array-of-vectors is more efficiently stored as a Matrix.

Currently, the relation is not a type constructor, but a function, something like

``````typeof(rand(dist, n::Int)) == f(typeof(rand(dist)), n)

f(Array{T, n}) = Array{T, n+1}
f(T) = Vector{T}
``````

RecursiveArrayTools.jl gives a way to treat, for example, a vector-of-vectors as a matrix. It seems in the above case we need something to go the other direction, allowing internal representation as a regular array, but indexing as nested arrays.

Or maybe this is entirely wrong, and we need a different approach. Any ideas?

1 Like

there’s a lot of irregularity in the type that’s returned

I can think of the fact that `rand(d::UnivariateDistribution, n::Int)` returns a vector and `rand(d::MultivariateDistribution, n::Int)` returns a matrix, but `rand(d::MatrixDistribution, n::Int)` returns a vector of matrices. What else do you have in mind?

(This came up here, at least.)

Hi @johnczito, welcome!

The problem is the lack of a consistent relationship between `typeof(rand(dist))` and `typeof(rand(dist, n::Int))`. Starting with `dist = Normal()`, we have

``````julia> rand(Normal())
1.0200255085620293

julia> rand(Normal()) |> typeof
Float64

julia> rand(Normal(),2)
2-element Array{Float64,1}:
-0.5708464683263881
-0.5539728783723908
``````

From this, one might expect that an extra `Int` argument to `rand` always changes the type from `T` to `Array{T,1}`. But that’s not it at all:

``````julia> rand(MvNormal(zeros(3)))
3-element Array{Float64,1}:
-0.0
0.0
0.0

julia> rand(MvNormal(zeros(3)),2)
3×2 Array{Float64,2}:
-0.0  0.0
-0.0  0.0
0.0  0.0
``````

This might not seem like a big deal. After all, it’s easy enough to remember to treat these cases separately. The pain point comes when trying to write code that deals with an arbitrary distribution. As it is, generic functions on distributions behave strangely:

``````julia> samplemean(dist,n) = mean(rand(dist,n))
samplemean (generic function with 1 method)

julia> samplemean(Normal(),100)
0.13491200393928363

julia> samplemean(MvNormal(zeros(3)),100)
0.0
``````