Reworking Distributions.jl

That’s why I wrote KeywordDispatch.jl, but unfortunately I never got the time to finish off https://github.com/JuliaStats/Distributions.jl/pull/823

4 Likes

It’s really more of an accessibility thing. Not all text editors come with the ability to quickly enter unicode characters. As much as I like writing unicode characters (and lots of my code is littered with them) I really don’t think that central packages should expose interfaces that require unicode – it’s costless for most of us who already use latex characters, but it is a nontrivial problem if you are brand new and have no idea how to type weird squiggly characters that aren’t on your keyboard. Plus, it makes describing code on GitHub issues slightly more complex than it needs to be, since the tooling you have to type unicode on GitHub may not be the same as in your editor.

Mostly a matter of opinion, but I think unicode characters have a nasty habit of forcing people into particular editors and tooling which is almost never a good idea.

Ah, I see. @simonbyrne what’s the performance like for dispatching on keywords?

6 Likes

I didn’t see any when I tested it, but it would require some more thorough checking to be sure.

can you differenciate the gamma function wrt the parameters using AD?

Probably, but if not you tell most ADs how to. It’s important to consider this sort of thing when you actually build the implementation, but the high-level design isn’t tied to this. I don’t see how any result of this question could affect judgment of the design.

For specifying “parameter space” like P, I believe lens is the best API, as I mentioned before in DifferentialEquations, Control Systems, and Linearization. Unlike namedtuple, it can describe nested fields quite easily (it actually can describe arbitrary locations). Maybe it’s useful for implementing mixture models. It also is easy to do change of parameter and adding constraints.

6 Likes

Nice! I’ve seen lenses in other languages, and I agree they can work very well. For some cases here we won’t really need the nesting, but there’s also really no need to have the same parameter type for all distributions.

For example, we might have a little helper macro like

macro distribution(d,getX)
    D = esc(d);

    quote
        struct $D{P,X} <: Distribution{P,X}
            par :: P
            $D(p::P) where {P} = new{typeof(p), $getX(P)}(p)
        end
    end
end

For Gamma, I don’t see any benefits of using a lens over a named tuple (are there any?). But the API I’m proposing would allow both. Part of the point is to make distribution combinators much easier to build and reason about than they are today. I agree lens could be a great fit for lots of cases.

I like the general idea here, but I have a few concerns. What happens when we need to cache some values when constructing the distribution for computational purposes? For example for MvNormal, it’s more efficient to compute the Cholesky of the covariance once at the beginning and reuse it later. This may not be ideal for symbolic distributions unless a symbolic Cholesky is defined. So it’s not always parameters -> sample, sometimes it’s parameters -> intermediate -> samples for computational efficiency. A cache field in the distribution is probably all that’s needed to take care of this at the type level, but then this cache can be different for different parameterizations of the distribution. For example, consider an MvNormal with a constant variance for all the variables or even a diagonal variance. In this case, the cache doesn’t need to store anything at all. Also, only the names of the parameters may not tell us all we need to know about the distribution. For example, an MvNormal wth a constant mean for all the variables and MvNormal with a mean vector should both probably have \mu refer to their means, but the 2 \mu's will have different types. How can this case be handled for symbolic distributions?

I think instead of starting with Gamma, start with MvNormal. If you can make the latter work in every case, I am willing to bet that you can make any other distribution work with a fraction of the effort! And at that point, more people would probably be convinced that this is the right way to go. Good luck!

2 Likes

We can have MvNormal{P,X} for lots of P and X types. I’d guess a very common need will be

logpdf(d::MvNormal{P,X}, x::X) where 
    { P <: NamedTuple{(:μ,:Σ),Tuple{AbstractArray,AbstractPDMat}}
    , X <: AbstractArray}

but we could also have

logpdf(d::MvNormal{P,X}, x::X) where 
    { P <: NamedTuple{(:μ,:L),Tuple{AbstractArray,Cholesky}}
    , X <: AbstractArray}

The former includes a .chol field, while the latter is all Cholesky. These are equivalent if L==Σ.chol, and we could have mappings between them.

Right, so we could have different methods for symbolic representations, or those could be in a separate library, to reduce dependencies.

We have more than just the names. We have

  • The parameter names
  • The parameter types
  • The eltype

There’s also no restriction here to using named tuples. We could have a distribution with an ApproxFun as a parameter, or with a Turing model as a parameter. Anything you want, really. Compared to the current implementation, this is both more strictly flexible and strictly more informative to the compiler.

I hope it’s clear by now this is just another method that can be added after the fact

Thanks! Right now I think the hardest and most awful part is deconstructing the current library to get all of the VariateForm and ValueSupport stuff out of there. It’s pretty tightly integrated :frowning:

Sounds feasible if we leave out the symbolic stuff for now. May still be clunky though, so we should think of a way to make it less so. For example, dispatching on NamedTuple{(:μ, :Σ), Tuple{AbstractVector, AbstractPDMat}} looks a bit scary to me. I am not sure if traits can help here, will need to think more about it.

What’s scary about it?

The parameter space it completely open, we can write it however is most useful. I do think we should leave open the possibility of adding methods parameterizing by mean and precision, since this is often useful.

@tkf If we use named tuples here, can those be terminals of a SetField.jl lens?

Just the many combinations for which you will have to define methods. There might be a more multi-layered way to organize these methods using types and the Julia compiler to generate the same code that we could have manually written, e.g. using FillArray to change a scalar mean to a vector one. But then this will probably clash with symbolic simplifications which require as much information as possible to be available at parse-time which the above type-based compiler magic obscures. For instance, looking at the AST which uses a FillArray, you can’t tell that these x[i]s are all just a single scalar that you can do some symbolic manipulations to. So there is a bit of a trade-off here between: 1) spelling out the symbolic functions in the method implementations, implementing many of those methods, and 2) code efficiency, making use of types and compiler magic to generate efficient code for numerics, but limiting the information available to symbolic engines.

Edit: I think in Soss, you may be actually calling x[i] and getting the symbol/value back inlining it in the expression when x is a FillArray so my comment above is probably not applicable.

I agree that the named tuple approach is convenient for defining individual distributions.

Right, I missed that you said "P can be anything" in the OP.

Setfield.jl works quite well with namedtuples. I think lenses still can be used for, e.g., specifying which parameters to infer. But I guess that’s orthogonal to the issue of defining pdfs of individual distributions.

Part of my motivation for using Gamma as an example was that there are a few different ways to parameterize it. I think most distributions will have a more standard form.

In that case we could either have some code duplication, or we could just have a function for the scalar mean case that just builds a FillArray and then calls the vector version.

I don’t understand what you mean here by symbolic simplifications. Do you mean things like the SymPy manipulations in Soss? It seems to me this will make them much easier, because the ::Real type constraint will be in the numeric methods and out of the type definition. As long as new methods can be easily built, I don’t see the difficulty. But I may be a little distracted by how the current pain of building new methods, in which case I may be underestimating other concerns.

For the symbolic stuff, my current rule is to only work with Sample (~) statements. For Asssignments (=), I just copy the rhs to the top of the computation, no symbolics involved.

Yeah, I’m pretty excited about that aspect of the design.

Gen has this concept of a ChoiceMap, basically a nested dictionary mapping to all the stochastic choices made in a model. Maybe SossGen.jl should use lenses for this? But I guess that’s another discussion :slight_smile:

1 Like

I apologize for the lack of content, but I just wanted to say that I always get excited to see these kinds of re-structuring posts where a novel and potentially revolutionary idea is proposed and considered. Sadly I have no real input on the matter, but I just wanted to encourage your effort and say I’m looking forward to see where this takes us.

6 Likes

As far as I understand, there are two separate things in the proposal:

Constructors with NamedTuples

For these I would recommend a different surface syntax, possibly something like

# well-documented outer constructor
SomeDistribution(; kwargs...) = _SomeDistribution(kwargs) 

# actual constructors that do the work, not necessarily inner, but not part of the API
_SomeDistribution(param::NamedTuple{...}) = ... 

where for the latter one could include all permutations (possibly with a macro?) to help with the ordering problem, or just sort the kwargs (this could be something the compiler could elide I guess). Nonsensical combinations would just give a MethodError or an informative message.

This is something that could be done within the current package, gently deprecating everything else if it works out. It would be a great addition.

Redesign of the type hierarchy

Here the main question is what the types are used for. It would be interesting to survey existing and requested uses cases (by looking at code and discussions here). Are people using the types in Distributions.jl for dispatch outside the package, or is it merely an implementation detail?

In any case, I would recommend gradually switching to an API using query/accessor functions, potentially with traits (“what’s the return type of this distribution? it is a scalar”, “is it bounded”, etc), and deprecating everything else. This should keep things flexible, and allow a redesign of the internals later on. The latter could be useful in the implementation, but should be deprecated for use outside the package.

5 Likes

That’s very kind, thank you!

Right. I don’t know how much it matters, but I usually think of these in the opposite order than you described them. The design allows new methods to be added after the fact, reparameterizing both the parameter and sample space.

The inner constructor I provide above for Gamma is specifically for named tuples, and this seems to me a sensible common parameterization for the majority of cases. Even better would be a named-tuple-like type that is naturally unordered. I know there was some debate about ordering when named tuples were added, so I expect this must be possible.

This looks sensible to me, I think we just need to get one distribution working in detail to anticipate any issues with the API.

The types I use for dispatch are currently just the function names, because the current implementation is very limiting. This is my primary motivation for wanting a change.

My use case is different than most. Probabilistic programming is more demanding of API regularity than most use cases, and for Soss I need characteristics of the distribution to be determined statically.

Beyond this, it’s important that distributions can be defined easily. If you’d like to see how painful this is in the current implementation, try implementing something like a distribution over binary trees, in terms of an initial distribution and a mapping of a value to pairs of children from which to sample. Sample space of the result should be some binary tree type.

This is a complete mess in the current API.

I can’t imagine what this would look like. The current API is unusable for my needs, so I mostly find ways to go around it.

3 Likes

Sure it is. My point above was that you can simply

  1. have a method for all permutations either “manually”, or
  2. with a macro,
  3. or pre-sort NamedTuples in some canonical order.

I think (3) is cleanest, but this should be checked.

And do you actually need all this detail to be available from the API directly, before sampling?

IMO the best approach to complicated types is heuristic: get a value, and suppose that all values are like that (maybe the compiler can infer it, which is nice), otherwise degrade gracefully with a branch. This is pretty much what collect does, for example.

I don’t think that we should go overboard with interfaces that operate on types, even if they are useful occasionally.

Eg have a function value_kind that returns a trait for being a discrete distribution, a real number, a vector, etc, to describe sampled values. When it is a real number, value_bounds would be defined, returning a tuple of bounds, may be infinite.

While I don’t think that the current API is ideal, we should realize that it is the accretion of many years of contributions, a lot of which happened before current patterns of Julia programming evolved. An API redesign is something that usually happens when someone gets sufficiently fed up with the current API to put in all the work :wink:

1 Like

I think the parameter space as type parameter can just be left out. I am not saying that the particular distributions should not be parametrised by the types of their parameters, just that the abstract type does not really need this. There is no point on dispatching on “Distribution with Float64 type parameter”, it is not even clear if the type parameters are values (e.g. a location parameter), whole numbers or probabilities (e.g. in a Binomial).

There are two quantities which perhaps would be useful to know statically on the abstract type level: the type of the samples and the type in which probabilities are measured.

Though right now I cannot really anticipate the use of dispatching a generic distribution on the type of probabilities. In any case, if we leave the probability type out, we can still use a trait with fallback probtype(::Any) = Float64 which maps to what we are doing.

The sample type could be part of the abstract type Sampleable, because if you solicit numbers from a random stream it would be good to have a default.

In total

abstract type Sampleable{X} end

should be more than enough and even

abstract type Sampleable end

would be workable by hooking into the the eltype trait which already exists and makes perfect sense as type trait for Sampleables .

2 Likes

I’m very on-the-fence about this whole unordered parameters thing. It could be handy in some cases, but most distributions have a standard ordering. Anyway, I think this is a detail we don’t need to worry about too much until we have a few more examples.

“Need” is a strong word. I’m getting by without it, but things could be much better. I’m not sure what you mean by “all this detail”, which seems to imply unneeded complexity. But the proposal is much simpler than the current API.

I’ve done this in some cases, and have been disappointed it’s not cleaner. To me it feels heavy-handed and cumbersome to write, and there’s some definite performance overhead. This kind of thing is the whole idea of type systems, so why not take advantage of what we have?

This is an interesting perspective. In your perspective is the proposal “overboard” in this way?

Again interesting. Types give us a way to reason statically about the behavior of a program. It helps us catch bugs, prevent dynamic dispatch, and find opportunities for optimization. A great thing about Julia is the ability to encode whatever you know statically into the type system, without any of it being forced. In my experience the usefulness isn’t “occasional”, but “almost always”. The difficulty with the current API isn’t that is uses types, but that the information we need isn’t there, and what is there doesn’t lend itself to extension.

I don’t understand how this is simpler than just having the return type available as a parameter. How many of these functions would there need to be? Would adding a “tree” type require going back through all existing distributions and adding “not a tree”? Would adding a new distribution require manually specifying which of these traits it satisfies? In that case we’d be back to non-extensibility.

Absolutely. It’s unusual for any version 1.0 API to anticipate all use cases, and Distirbutions is clearly written with classical statistics in mind, rather than probabilistic programming. For classical stats, “here’s a fixed, finite set of distributions you can use” is really common. But for probabilistic programming that’s not enough.

There are certainly benefits to community effort, and building interest in this is one of the goals of this discussion. I’d much rather have something that builds to a widely-used and community-maintained library, than a quick one-off without many distributions or users. There may be some intermediate growing pains, but I think stats in Julia can be much better, and in the long run it’s well worth it.

I disagree. To me the most obvious argument for the proposed API is that it treats distributions as an abstraction of functions, and are typed in a way familiar to anyone who has used functional programming. This is a big benefit when you start thinking about abstraction and composition.

From a more traditional perspective, consider parameterizing a Gaussian by (mean, variance) vs (mean, precision). Or a binomial by probability vs logit(probability). These things can have significant benefits, both in terms of convenience and (statistical and computational) performance.

Do you mean the return type of pdf? You may be right, but it’s not clear to me yet that this is intrinsic to a distribution.

In my experience, the best way to get fast and safe code is to get as much information to the compiler as possible. Usually “the compiler” means Julia, but in the case of probabilistic programming, it may also be something written by the user.

Can you help me understand the point of eliminating the P in Sampleable{P,X}?

1 Like