Reworking Distributions.jl

Hello all,

I’ve been thinking some more about a refactoring of Distributions.jl, and I’d like to ask for feedback on a proof of concept.

First, the abstract types:

abstract type Sampleable{P,X} end

abstract type Distribution{P,X} <: Sampleable{P,X} end 

Here P is the parameter space, and X is the type of the sample space. Critically, there are no constraints on either of these at the type level. If you want X to be something other than <: Real or <: Array, go for it. No problem.

P can be anything, but NamedTuples are especially convenient, because they make it easy to use different parameterizations. For example, a Gamma could look like this:

struct Gamma{P,X} <: Distribution{P,X}
    par :: P
    Gamma(nt::NamedTuple) = Gamma{typeof(nt), typeof(nt[1])}(nt)
end

Again, the types are very flexible. A Gamma over symbolic values, for example, is easy.

Passing a NamedTuple is a bit awkward, but that’s easily fixed with

Gamma(;kwargs...) = Gamma((;kwargs...))

We may also want a default parameterization without requiring names:

Gamma(k :: Real, θ :: Real) = Gamma(k=k, θ=θ)

logpdf then looks like this:

function logpdf(d::Gamma{P,X}, x::X) where {P <: NamedTuple{(:k,:θ)}, X <: Real}
    k = d.par.k; θ = d.par.θ
    (k - 1) * log(x) - x/θ - k * log(θ) - loggamma(k)
end

This is for a shape k and scale θ. Maybe we’d rather parameterize by a shape and rate? Ok:

function logpdf(d::Gamma{P,X}, x::X) where {P <: NamedTuple{(:α,:β)}, X <: Real}
    α = d.par.α; β = d.par.β
    (α - 1) * log(x) - β*x + α * log(β) - loggamma(α)
end

Or by a shape and a mean?

function logpdf(d::Gamma{P,X}, x::X) where {P <: NamedTuple{(:k,:μ)}, X <: Real}
    μ = d.par.μ; k = d.par.k
    α = k; β = k/μ
    (α - 1) * log(x) - β*x + α * log(β) - loggamma(α)
end

This is already much faster than Distributions (at least in part due to lack of inlining in the latter):

julia> @btime logpdf(Gamma(2.0,1.0),4.0)
  13.964 ns (0 allocations: 0 bytes)
-2.613705638880109

vs

julia> using Distributions

julia> @btime logpdf(Gamma(2.0,1.0),4.0)
  36.097 ns (0 allocations: 0 bytes)
-2.613705638880109

Some things to consider:

  • Lots of packages depend on Distributions.jl, and these breaking changes could take a while to be fully supported
  • Defining separate methods for each parameterization is a bit verbose, but there’s always the option to fall back on one canonical parameterization
  • Some aspects of this would probably need to be implemented using traits. But let’s not rush to that until it’s clearly the best approach
  • This approach specifies a parameterization in terms of the names for a NamedTuple. But there’s likely to be some contention exactly what those names should be, for example should we replace k and θ with shape and scale?
  • Fields in NamedTuples are ordered, which is a little troubling for this use case. There may be a clever way to dispatch according to the set of names, rather than the tuple itself
15 Likes

I like the feeling of this – it’s very unencumbered and I think it could be handy.

True. I think probably forking Distributions.jl into Distributions2.jl or whatever is better for the moment.

I think that’s fine – if you have some weird typing, I think it’s your responsibility to extend the methods as needed, and then the core package should just support the main methods.

I can’t really contribute to this point, unfortunately, since I’m kind of a trait yokel.

I’m fine with using the actual parameter names, as long as they are non-unicode and the docstring tells us what they are.

Why do we need to dispatch on parameter names and not the distribution type?

1 Like

That’s funny, “non-unicode” would be a little annoying to me :slight_smile:
μ and σ seem sensible to me. Easy enough to type, and very little screen clutter. But I don’t see any benefit to mu and sigma over mean and sd.

EDIT: Then again, with multiple parameterizations you’d get to use whatever you like! The problem comes when there are opportunities for inconsistency

It would be both; the names are part of the type. The problem is situations like

julia> logpdf(Gamma(k=2.0,θ=1.0),4.0)
-2.613705638880109

julia> logpdf(Gamma(θ=1.0,k=2.0),4.0)
ERROR: MethodError: no method matching logpdf(::Gamma{NamedTuple{(:θ, :k),Tuple{Float64,Float64}},Float64}, ::Float64)

Without multiple methods or some additional trick, you need to have the parameters in a prescribed order.

1 Like

That’s why I wrote KeywordDispatch.jl, but unfortunately I never got the time to finish off WIP: Add keyword constructors by simonbyrne · Pull Request #823 · JuliaStats/Distributions.jl · GitHub

4 Likes

It’s really more of an accessibility thing. Not all text editors come with the ability to quickly enter unicode characters. As much as I like writing unicode characters (and lots of my code is littered with them) I really don’t think that central packages should expose interfaces that require unicode – it’s costless for most of us who already use latex characters, but it is a nontrivial problem if you are brand new and have no idea how to type weird squiggly characters that aren’t on your keyboard. Plus, it makes describing code on GitHub issues slightly more complex than it needs to be, since the tooling you have to type unicode on GitHub may not be the same as in your editor.

Mostly a matter of opinion, but I think unicode characters have a nasty habit of forcing people into particular editors and tooling which is almost never a good idea.

Ah, I see. @simonbyrne what’s the performance like for dispatching on keywords?

6 Likes

I didn’t see any when I tested it, but it would require some more thorough checking to be sure.

can you differenciate the gamma function wrt the parameters using AD?

Probably, but if not you tell most ADs how to. It’s important to consider this sort of thing when you actually build the implementation, but the high-level design isn’t tied to this. I don’t see how any result of this question could affect judgment of the design.

For specifying “parameter space” like P, I believe lens is the best API, as I mentioned before in DifferentialEquations, Control Systems, and Linearization - #7 by tkf. Unlike namedtuple, it can describe nested fields quite easily (it actually can describe arbitrary locations). Maybe it’s useful for implementing mixture models. It also is easy to do change of parameter and adding constraints.

6 Likes

Nice! I’ve seen lenses in other languages, and I agree they can work very well. For some cases here we won’t really need the nesting, but there’s also really no need to have the same parameter type for all distributions.

For example, we might have a little helper macro like

macro distribution(d,getX)
    D = esc(d);

    quote
        struct $D{P,X} <: Distribution{P,X}
            par :: P
            $D(p::P) where {P} = new{typeof(p), $getX(P)}(p)
        end
    end
end

For Gamma, I don’t see any benefits of using a lens over a named tuple (are there any?). But the API I’m proposing would allow both. Part of the point is to make distribution combinators much easier to build and reason about than they are today. I agree lens could be a great fit for lots of cases.

I like the general idea here, but I have a few concerns. What happens when we need to cache some values when constructing the distribution for computational purposes? For example for MvNormal, it’s more efficient to compute the Cholesky of the covariance once at the beginning and reuse it later. This may not be ideal for symbolic distributions unless a symbolic Cholesky is defined. So it’s not always parameters -> sample, sometimes it’s parameters -> intermediate -> samples for computational efficiency. A cache field in the distribution is probably all that’s needed to take care of this at the type level, but then this cache can be different for different parameterizations of the distribution. For example, consider an MvNormal with a constant variance for all the variables or even a diagonal variance. In this case, the cache doesn’t need to store anything at all. Also, only the names of the parameters may not tell us all we need to know about the distribution. For example, an MvNormal wth a constant mean for all the variables and MvNormal with a mean vector should both probably have \mu refer to their means, but the 2 \mu's will have different types. How can this case be handled for symbolic distributions?

I think instead of starting with Gamma, start with MvNormal. If you can make the latter work in every case, I am willing to bet that you can make any other distribution work with a fraction of the effort! And at that point, more people would probably be convinced that this is the right way to go. Good luck!

2 Likes

We can have MvNormal{P,X} for lots of P and X types. I’d guess a very common need will be

logpdf(d::MvNormal{P,X}, x::X) where 
    { P <: NamedTuple{(:μ,:Σ),Tuple{AbstractArray,AbstractPDMat}}
    , X <: AbstractArray}

but we could also have

logpdf(d::MvNormal{P,X}, x::X) where 
    { P <: NamedTuple{(:μ,:L),Tuple{AbstractArray,Cholesky}}
    , X <: AbstractArray}

The former includes a .chol field, while the latter is all Cholesky. These are equivalent if L==Σ.chol, and we could have mappings between them.

Right, so we could have different methods for symbolic representations, or those could be in a separate library, to reduce dependencies.

We have more than just the names. We have

  • The parameter names
  • The parameter types
  • The eltype

There’s also no restriction here to using named tuples. We could have a distribution with an ApproxFun as a parameter, or with a Turing model as a parameter. Anything you want, really. Compared to the current implementation, this is both more strictly flexible and strictly more informative to the compiler.

I hope it’s clear by now this is just another method that can be added after the fact

Thanks! Right now I think the hardest and most awful part is deconstructing the current library to get all of the VariateForm and ValueSupport stuff out of there. It’s pretty tightly integrated :frowning:

Sounds feasible if we leave out the symbolic stuff for now. May still be clunky though, so we should think of a way to make it less so. For example, dispatching on NamedTuple{(:μ, :Σ), Tuple{AbstractVector, AbstractPDMat}} looks a bit scary to me. I am not sure if traits can help here, will need to think more about it.

What’s scary about it?

The parameter space it completely open, we can write it however is most useful. I do think we should leave open the possibility of adding methods parameterizing by mean and precision, since this is often useful.

@tkf If we use named tuples here, can those be terminals of a SetField.jl lens?

Just the many combinations for which you will have to define methods. There might be a more multi-layered way to organize these methods using types and the Julia compiler to generate the same code that we could have manually written, e.g. using FillArray to change a scalar mean to a vector one. But then this will probably clash with symbolic simplifications which require as much information as possible to be available at parse-time which the above type-based compiler magic obscures. For instance, looking at the AST which uses a FillArray, you can’t tell that these x[i]s are all just a single scalar that you can do some symbolic manipulations to. So there is a bit of a trade-off here between: 1) spelling out the symbolic functions in the method implementations, implementing many of those methods, and 2) code efficiency, making use of types and compiler magic to generate efficient code for numerics, but limiting the information available to symbolic engines.

Edit: I think in Soss, you may be actually calling x[i] and getting the symbol/value back inlining it in the expression when x is a FillArray so my comment above is probably not applicable.

I agree that the named tuple approach is convenient for defining individual distributions.

Right, I missed that you said “P can be anything” in the OP.

Setfield.jl works quite well with namedtuples. I think lenses still can be used for, e.g., specifying which parameters to infer. But I guess that’s orthogonal to the issue of defining pdfs of individual distributions.

Part of my motivation for using Gamma as an example was that there are a few different ways to parameterize it. I think most distributions will have a more standard form.

In that case we could either have some code duplication, or we could just have a function for the scalar mean case that just builds a FillArray and then calls the vector version.

I don’t understand what you mean here by symbolic simplifications. Do you mean things like the SymPy manipulations in Soss? It seems to me this will make them much easier, because the ::Real type constraint will be in the numeric methods and out of the type definition. As long as new methods can be easily built, I don’t see the difficulty. But I may be a little distracted by how the current pain of building new methods, in which case I may be underestimating other concerns.

For the symbolic stuff, my current rule is to only work with Sample (~) statements. For Asssignments (=), I just copy the rhs to the top of the computation, no symbolics involved.

Yeah, I’m pretty excited about that aspect of the design.

Gen has this concept of a ChoiceMap, basically a nested dictionary mapping to all the stochastic choices made in a model. Maybe SossGen.jl should use lenses for this? But I guess that’s another discussion :slight_smile:

1 Like

I apologize for the lack of content, but I just wanted to say that I always get excited to see these kinds of re-structuring posts where a novel and potentially revolutionary idea is proposed and considered. Sadly I have no real input on the matter, but I just wanted to encourage your effort and say I’m looking forward to see where this takes us.

6 Likes

As far as I understand, there are two separate things in the proposal:

Constructors with NamedTuples

For these I would recommend a different surface syntax, possibly something like

# well-documented outer constructor
SomeDistribution(; kwargs...) = _SomeDistribution(kwargs) 

# actual constructors that do the work, not necessarily inner, but not part of the API
_SomeDistribution(param::NamedTuple{...}) = ... 

where for the latter one could include all permutations (possibly with a macro?) to help with the ordering problem, or just sort the kwargs (this could be something the compiler could elide I guess). Nonsensical combinations would just give a MethodError or an informative message.

This is something that could be done within the current package, gently deprecating everything else if it works out. It would be a great addition.

Redesign of the type hierarchy

Here the main question is what the types are used for. It would be interesting to survey existing and requested uses cases (by looking at code and discussions here). Are people using the types in Distributions.jl for dispatch outside the package, or is it merely an implementation detail?

In any case, I would recommend gradually switching to an API using query/accessor functions, potentially with traits (“what’s the return type of this distribution? it is a scalar”, “is it bounded”, etc), and deprecating everything else. This should keep things flexible, and allow a redesign of the internals later on. The latter could be useful in the implementation, but should be deprecated for use outside the package.

5 Likes

That’s very kind, thank you!

Right. I don’t know how much it matters, but I usually think of these in the opposite order than you described them. The design allows new methods to be added after the fact, reparameterizing both the parameter and sample space.

The inner constructor I provide above for Gamma is specifically for named tuples, and this seems to me a sensible common parameterization for the majority of cases. Even better would be a named-tuple-like type that is naturally unordered. I know there was some debate about ordering when named tuples were added, so I expect this must be possible.

This looks sensible to me, I think we just need to get one distribution working in detail to anticipate any issues with the API.

The types I use for dispatch are currently just the function names, because the current implementation is very limiting. This is my primary motivation for wanting a change.

My use case is different than most. Probabilistic programming is more demanding of API regularity than most use cases, and for Soss I need characteristics of the distribution to be determined statically.

Beyond this, it’s important that distributions can be defined easily. If you’d like to see how painful this is in the current implementation, try implementing something like a distribution over binary trees, in terms of an initial distribution and a mapping of a value to pairs of children from which to sample. Sample space of the result should be some binary tree type.

This is a complete mess in the current API.

I can’t imagine what this would look like. The current API is unusable for my needs, so I mostly find ways to go around it.

3 Likes