Reworking Distributions.jl

What if I have a type that is Sampleable, but must participate in a completely different part of the type tree? I’m thinking about Particles <: Real or VAE <: AbstractGenerativeModel etc. These must be subtypes of their own type trees to work well, but can certainly be sampled from. The same with many probabilistic models etc., they often participate in their own type trees while acting as probability distributions at the same time. With this view point, it would be great if IsSampleable is a trait rather than an abstract type. Sorry for being late to the discussion, I didn’t realize this was potentially important to my application until now :confused:

A type-based workaround would be to implement wrapper types for everything that can also act like a probability distribution

This is great, then one could stick Particles in there, or any other form of prior distribution etc.

I would parameterize even harder to allow different types for mean and variance, consider e.g., that one wants to stick other distributions in there, such as Uniform(Normal(...), Gamma(...))

This suggestion would solve that problem, but may result in T=Any if one of the types is a Distribution whereas the other one is a VariationalAutoencoder, but maybe this is a pie-in-the-sky kind of application for which some dynamic dispatch would be acceptable?

That absolutely sounds worth considering

6 Likes

In total

abstract type Sampleable{X} end

should be more than enough and even Edit: that is too much

What if I have a type that is Sampleable, but must participate in a completely different part of the type tree?

You are absolutely right! I made these comments in situations where even more structure such as dimension, type of samples, type of probability, discreteness, uni- or multivariate was supposed to be baked into the abstract type. Having no abstract type, but a trait is very much preferable (or even no trait at all, just a set of functions with agreed behaviour, e.g. like we do it for iterators).

5 Likes

The current implementation is trait-based:

@trait IsMeasure{M,X} >: HasRand{M,X} where {X = eltype(M)} begin
    rand :: [M] => eltype(M)
end

I’ve actually started to wonder if we might need this anyway. Signed measures form a vector space over the reals, where addition is superposition. But if Distributions are also Measures, we’re getting into type piracy. So maybe we need a wrapper for working with external structures that happen to also be Measures?

I see the appeal of this, but going in this direction can cause a lot of problems. It’s important to distinguish a type from a measure on that type. On the other hand, some measures like a Dirichlet process really do take another measure as a parameter.

But you could build something like you describe as a mixture of uniforms over a product measure.

We definitely want to avoid dynamic dispatch if possible, ideally without preventing users from extending in this direction.

Right, I was looking at this last option a bit last night. Overall, we have

  • Hierarchy of abstract types, with wrappers for external interface
  • CanonicalTraits
  • Holy traits
  • Duck typing

I expect we’ll end up with some mixture of maybe two of these, still not entirely clear

3 Likes

Yeah, for the sampling part, a variation of the iterator protocol would be very natural,
querying new iterators from an iterator and new samples from a sampler is almost the same, the main differences is that there are two ingredients, the sampler and the random number generator. @rfourquet has some thoughts on this as well.

1 Like

Thanks for the ping @mschauer. I’ve followed this discussion not closely as I would like, in big part because I’m not familiar with the Distribution package nor do I need work with advanced distributions mathematically.

I wish to have time to understand better the needs for Distributions2, but one thing which would be great (IMHO) is that the design allows to put a basic Distribution type in Base and define there simple distributions like Normal or Uniform, which is also useful for the package for more advanced distributions. I tried to include such a type at one point, but one of the feedback was that it can’t really be done independently of the Distributions package.

adding constraints?

At the moment Distributions.jl has reached the point in its lifecycle that some people have issues with the interface/implementation, and are experimenting with various approaches and interface conventions (eg see this topic, and many others). It is unclear what the common ground will be, and whether there will be a consensus that leads to a refactoring of Distributions.jl, a fork, or an alternative package. There is no consensus about what a Distribution is, how to define the interface to be extensible (eg what if I want a distribution over arbitrary objects? is it beneficial to consider something I can only sample from using MCMC a distribution for the purposes of this type?), about implementation choices (type hierarchies? traits?).

Because of this, I disagree with putting anything to do with distributions this into Base, or even Random. I think it is better to decouple all development from Julia releases whenever possible, so that interfaces can evolve freely.

9 Likes

Distributions 2: Redistributed

Would be a good movie title.

10 Likes

see Home · Kaleido.jl

1 Like

Sorry for the naive question: is it critical to have a two-level hierarchy with Distribution <: Sampleable ? Is the definition of Sampleable roughly “has rand” and Distribution “also has probability distribution” like in Distributions.jl ? Again I almost never used this package, but is it necessary to dispatch on these two types, or just erroring when a method is not defined for a Sampleable would be enough?

1 Like

FWIW @cscherrer’s API design looks good to my mind (nice work!). But I’m not clear if this mean that we are gathering momentum to develop his Measures.jl, or on a new revision of Distributions.jl, or something else again. It seems, if I may summarise, that we have consensus that Distributions.jl is approaching end of life for the current API at least, but have not yet converged on the step of replacing it…? Is simply submitting pull requests the way forward, and if so, where?

Thanks @danmackinlay! I haven’t looked at this in a while, and you’ve inspired me to have another look over it :slight_smile:

I haven’t been involved lately with discussions about the future of Distributions.jl. But in general, libraries become harder to change as they get wider adoption. I think the best path forward is to focus on wrapping all of Distributions.jl. Then there’s no debate needed, people can use whichever library suits them.

I think I’m going to simplify this quite a bit. I had been using @thautwarm’s CanonicalTraits.jl library, which really has some cool ideas. But I think it doesn’t lean too much on traits anyway, and a more standard route will make it easier for others to contribute.

4 Likes