Simplifying Distributions type hierarchy

In probabilistic programming, we use a a lot of distributions. In some settings, it’s important for it to be very easy to define new distributions, and to reason about them.

The current type of a Distribution or Sampleable is

help?> Sampleable
search: Sampleable

  Sampleable{F<:VariateForm,S<:ValueSupport}


  Sampleable is any type able to produce random values. Parametrized by a
  VariateForm defining the dimension of samples and a ValueSupport defining the
  domain of possibly sampled values. Any Sampleable implements the Base.rand
  method.

First we have to say whether a returned value is Univariate, Multivariate, or Matrixvariate. Then we have to say whether it’s Discrete or Continuous.

But all of this information is available from the type of the result. Why not just Sampleable{T}? Then…

  • sample(::Sampleable{T}) returns a value of type T
  • Sampleable{Int} is clearly univariate discrete
  • Sampleable{Vector{Float}} is multivariate continuous

In the current setup, say I want a distribution over trees. Then I have to define a new Treevariate type, and decide whether my trees will be continuous or discrete, or what that even means. Sampleable{Tree{MoreParams}} would do this just fine.

In practice, the heavy weight of implementing all of this often leads me to sidestep Distributions entirely. For example, in Soss I have an EqualMix combinator that gives a mixture model with equal-weight components. I define it as

struct EqualMix{T}
    components::Vector{T}
end

and add some methods, and it works just fine. If I could tell it T should be a Distribution, that would be better. But the current setup doesn’t seem to me to map well to the domain.

Am I missing something? What’s the big benefit of the current approach?

12 Likes

I was also very unclear on this. I can only imagine this allows some kind of fallback functionality for various distribution types.

3 Likes

Isn’t your Sampleable{T} in a trivial relationship with the current Sampleable{F<:VariateForm,S<:ValueSupport} if you have functions on T that specify (a) whether T is univariate or multivariate or (b) whether T is discrete or continuous? Looking through the Distributions.jl, there’s lots of methods defined on univariate types, which is what led to the original design IIRC.

I think your proposed hierarchy is more general and could be made to work. But is it better enough to go through a substantial refactoring of the code and potentially break everything out there?

Can something be done with traits, which can then be defined on the old hierarchy so that existing code isn’t broken?

1 Like

Here’s an example from Soss:

julia> m = @model x begin
           α ~ Normal()
           β ~ Normal()
           σ ~ HalfNormal()
           yhat = α .+ β .* x
           n = length(x)
           y ~ For(n) do j
               Normal(yhat[j], σ)
           end
       end;

julia> rand(m(x=rand(3)))
(x = [0.4742281389391303, 0.8795517084789848, 0.48770790662969965], yhat = [0.7656025986035866, 0.69947236269799, 0.7634033181167845], n = 3, σ = 0.9023491416822307, β = -0.16315418316450547, α = 0.842974903245824, y = [-1.1024328623804904, 1.2723146474947935, -0.7741104398758925])

There should be a way to make a model built this way an instance of Sampleable. In a simpler approach the type would be really clear. How would you do this with the current definitions?

  • It’s a huge amount of work for very little benefit. In the current setup it’s not worth the trouble
  • It could be much easier

There are other irregularities, like how check_args is sometimes (but not always) available as a keyword argument. It’s a great library for what it was designed for, but PPLs need more. All of this just comes down to PPLs being much more demanding than classical statistical methods.

1 Like

Perhaps, it’s worth writing Distributions2.jl to address the design concerns above and see if it provides enough benefits over the current design that making it the default is worth it.

6 Likes

I think that the current approach comes from the early days of Julia when users were still exploring the language — most importantly, before traits were widely used. I have said this before in a previous discussion we had:

I agree with @mohamed82008 that a rewrite would be best to demo a simpler setup. I would recommend much less inheritance and parameterization at the API level, and suggest traits instead (inheritance and type parameters could still be used to implement a lot of functionality where they make sense). This could start as a PR, and depending on the reception it could either be merged or become Distributions2.

6 Likes

The most important question (I think it is no question that the way distributions are parametrised is too restrictive) is whether the type of the samples T should be in the abstract type Distribution{T} or not Distribution. It does not have to be, one could either hook into the eltype trait or parallel infrastructure sampletype etc.

One question is whether we think of the distributions of rand(Float64) and rand(Float32) as same distribution with different precision or as different distributions? To experiment (and to solve some issues with the type hierarchy I was experiencing) I created

to show that a single struct Gaussian can represent univariate, multivariate normal distribution with various element types.

3 Likes

Just a minor thing, showing the trickiness: the samples of rand(Bool) + 1/2 are Float64 valued and have a discrete distribution.

4 Likes

Like @mschauer mentioned, some edge cases are very tricky, we just hit some in some PRs.

The current PR tackling the type system issue is https://github.com/JuliaStats/Distributions.jl/pull/951, just fixing some things on ValueSupport.

The trait discussion already happened a few months ago in another discourse thread, and it is more complex than it first appears. An idea I would submit is finish to #951 and a few things, release a last minor 0.22 (an equivalent of julia 0.7), and then release a 1.0. We can then experiment with a trait-based system for 2.0

1 Like

I don’t understand the design space of traits well enough, can we be confident that they’ll solve some problems and create others? What’s the right way to think about the tradeoffs, and arrive at the right combination of parameterized types and traits?

Parameterized types can get us a lot further than where we are now if types were used in a more generic way. Currently we have 3 choices for VariateForm and two for ValueSupport, so there are really only 6 types of Distributions, and extending this is very slow. Parameterizing by the return type would get us a lot of mileage.

It could be helpful to set out some goals. Mine would include

  • The return type should be easy to determine statically from the type (no eltype required)
  • Extensibility should be easy and very lightweight (distributions over trees, named tuples, etc)
  • “Combinators”, aka “higher-order” distributions should be easy. Here I’m thinking things like MixtureModel in Distributions, or iid or For in Soss.
  • There should be a very flexible and extensible way of representing the support of a distribution, and this should represent the exact support, or indicate otherwise
  • There should be a way to represent different parameterizations of a distribution easily and with no performance penalty
  • Argument checking should be optional, with a uniform interface across distributions

Is the trickiness that one might want to build a distribution to represent values like this? It’s pretty common to work with distributions that are functions of or inverse functions of some known distribution. In this case what you describes has a bijection to Bernoulli(0.5). Is it hard to create a type to represent this?

Is there evidence that traits are the right way to go, or what’s the “right way” to use them for this use case?

1 Like

We encountered the problems discussed here in Manifolds.jl, and the way we’re handling it right now is with our own VariateForm and ValueSupport (e.g. see https://github.com/JuliaNLSolvers/Manifolds.jl/blob/master/src/DistributionsBase.jl). The support is typed on the manifold for dispatch. In some cases, we have a type parameter for the variables in the distribution similar to what you’re proposing (e.g. https://github.com/JuliaNLSolvers/Manifolds.jl/blob/master/src/ProjectedDistribution.jl#L9-L13).

In practice, we combine manifolds into a product manifold and their distributions into a ProductPointDistribution. At some point when we have support for maps, we’d like to have pushforward distributions. The result is that if we define a manifold and a type for a point on that manifold, we can also define a distribution for that point. I doubt they can just be used in any PPL without some tweaking though.

I’m interested in a traits-based approach and would love it if it would make defining these kinds of distributions more straightforward, though I don’t know what that would look like.

Edit: @mateuszbaran may have ideas; he implemented most of the distributions code in Manifolds.jl

5 Likes

Type hierarchy is Distributions.jl isn’t particularly well suited for manifold-valued data. For example VariateForm doesn’t really fit here. Representations of a single point or vector are very diverse in Manifolds.jl and I’ve basically opted out of VariateForm stuff by defining MPointvariate and FVectorvariate.

I like lightweight and trait-based solutions. The abstract type Manifold in Manifolds.jl doesn’t have any type parameters and it doesn’t cause any issues. What would be the benefits of enforcing some type parameters like ValueSupport? Anyone can put the return type in type signatures of derived types like I did in, for example, ProjectedFVectorDistribution ( @sethaxen described it quite well). Another point is that for example continuous distributions should generally put return type in the type signature.

1 Like

Can I just add to the discussion that the name Distributions2.jl feels strange? If you guys decide on a better design, can we just migrate it to the name Distributions.jl and make it a major release instead?

I dislike this convention that numbers are appended to package names, it doesn’t feel elegant.

4 Likes

I think everyone would prefer this, the challenge is just how to get there.

I guess it’s a common problem in software. Early-stage projects can turn on a dime, but slow down as they mature and other projects depend on them. Eventually, a new use case exposes blind spots in the interface. But now change is hard, and there are still open PRs with significant amounts of work put into them that further entrench the existing approach.

So, what’s the right way out of this? Development is most efficient when everyone contributing shares a common vision of the end result. Otherwise the PRs drag on and on, with more time spent negotiating and debating the merits than developing or actually using an implementation.

This was my hope in setting out some goals. If we can agree on these, we can start to move toward how to actually get there.In the other extreme, if PPL needs are different than general statistical use, it would probably have to be a different library altogether, perhaps calling to Distributions.jl for sampling, logpdf, etc, similar to the current Gen approach.

4 Likes