There seem to be quite a few Julia repositories related to probabilistic programming. Here are the ones I’m aware of that have been updated in 2018 - anything I’m missing?
Any experiences using these or comparing/contrasting the approaches?
No experience at all.
Seems PPL is an area where Julia could really shine. I’m not familiar with the field, only with certain attempts like Anglican, pyro, Stan. Field seems still very young?
I’d love to see a good book on PPL using Julia. Nobody happens to writing one are they?
I guess that depends how you measure it. I’m not sure when the original BUGS was developed, but there was certainly some significant work in the field in the 90s. Martyn Plummer’s JAGS paper was 2003, and Church was published in 2008. The field recently got a big push from a DARPA program called PPAML (“Probabilistic Programming for Advancing Machine Learning”) that finished up just a couple of years ago.
Me too! Most things I’ve seen use Stan, and there’s some great material with PyMC3 as well. @goedman has mentioned McElreath’s “Statistical Rethinking” book recently on the Julia Slack channel, I think he’s looking into porting some things over from that.
In my experience, this sort of book tends to be more successful when the PPL used is a means to and end (namely, understanding Bayesian stats) as opposed to and end in itself. But my experience could well be a very biased sample
Church, haha yeah. some of my long-ago nlp research was based on Church (noah goodman went on to uber and release pyro). I also remember checking the DARPA projects often for a period.
I guess my history goes back more than I realize… i guess perhaps “young” is wrong; maybe more like not as popular as deep learning
But I’ve never done any significant work in this space, nor have I worked in it consistently… and I guess a lot of that is due to finding good intro-level material. ugh, like many things it seems dive into intro material with python and transfer to julia
Last I check Turing.jl was pretty light on examples/docs (I did see the juliacon2018 talk!). Also I hadn’t heard of Soss.jl (nor many others in the list you compiled) until you mentioned it on another discourse thread yesterday. I’m going to look at Soss.jl and again at Turing.jl and see If i can’t get something fun out of it.
Cool! Yeah I think there’s huge potential in Pyro’s approach - planning on doing the model/guide thing in Soss at some point.
Haha yeah but I think it’s getting there. And I love that there’s a lot of work involving both at once.
I had looked at Turing a bit in late 2017 and not again until recently. They’ve really been making some great progress! Besides, I think the big thing is to understand some of the concepts. There’s tons of material around for PyMC3. It’s pretty mature, and the interface is light-weight. I’d have a look at some of those materials for the concepts, maybe see about translating them into Soss or Turing.
I have used Stan.jl, then Mamba.jl briefly, and found them nice for a PPL.
Then I ran into a difficult models, for which, among other things, I
had to use some non-standard transformations (and adjust the log likelihood with the Jacobian determinant accordingly),
could save a lot of time using sufficient statistics,
could improve ESS a lot by conditioning the transformations/model structure on the data (think, eg, centered/non-centered parametrizations in a hierarchical model depending on group size, different for each group).
For 1 and 3, things got so complicated that I preferred to unit test the pieces, which I found difficult in a PPL. Also, benchmarking and profiling the model by small pieces was a lot of help in finding bottlenecks (think of likelihood evaluations that cost 1–2s with a gradient, even when coded manually, this adds up a lot).
This experience made me very skeptical of PPLs. Coding up the model as essentially a loglikelihood calculation is not the major pain for me when doing Bayesian inference, and PPLs make it a bit simpler, at the cost of making a lot of other things complicated.
For me, the quality of the backend doing the inference matters much more for nontrivial (\ge 5000 parameters) models than the surface syntax coding the model, which is essentially a fancy way of writing the likelihood. I spend much more time figuring out why I am getting bad mixing or slow sampling than thinking about likelihoods per se.
The ideal interface I am striving for these days is an API composed of Julia code that facilitates coding models. When I run into problems, this allows me to deal with them the same way as other Julia programs, using the extended set of tools kind people made available (eg ProfileView.jl and Traceur.jl, when I need to go beyond @code_typewarn).
This is very subjective, but I consider PPLs as a DSL as a way of bringing back the two-language problem. I think that this had a rationale in languages less powerful than Julia, eg Stan is great because R can’t cut it in speed and no one wants to program C++ if they can avoid it. But Julia’s powerful low-cost abstractions motivate the exploration of doing without PPLs for me.
(1) can be improved with reparameterized or statically-optimized distributions. The simplest example of this is a log-normal distribution. In principle we could work in terms of the pdf of this directly, but it happens that we can get the same effect by exponentiating a normal. There’s no reason this couldn’t be done more generally, for example working in terms of the logit of a beta distribution instead of the raw value. Perhaps a distribution could be expressed as the composition of “pre” and “post” functions, yielding opportunities for fusing these in cases where we really need the raw value.
(2) becomes a lot better if we work in terms of exponential families. This is exactly the case where sufficient statistics have a fixed dimensionality, independent of sample size. They also have a very convenient form for the logpdf. A PPL should, for example be able to statically refactor iid observations on a Bernoulli into a single observation on a Binomial.
(3) is already in the plans for Soss, at least the center/noncenter thing. I think Rao-Blackwellization should be doable statically as well.
Me too, in many cases. Stan is great but actually leads to a three-language problem (Stan + Python/R/Julia + C++). Plus, Stan is focused around cases where a “model” is a function (the log-posterior) and all inference is done in those terms. This is a perfect fit for a huge number of modeling problems, but there are certainly lots of things it can’t (and doesn’t intend to) do.
My subjective view is subjectively close to yours. The model, data manipulation, and the PPL implementation should all be in the same language. And it should be a monotonic improvement over writing it yourself - a single case of “this would be easier by hand” is one too many.
Regarding (1): eg the model I am currently working on is an economic model, where the structure of each the agent’s problem selects a parametric subset of A(\alpha, \beta, \kappa) \subset \mathbb{R}^7 conditional on observed outcomes. I won’t go into details here, but even mapping it to \mathbb{R}^7 was quite tricky. The benefit is that working on the unconstrained \mathbb{R}^n really helps HMC, then downside is of course calculating the Jacobian