Probabilistic programming repositories

cscherrer · December 30, 2018, 7:01pm

There seem to be quite a few Julia repositories related to probabilistic programming. Here are the ones I’m aware of that have been updated in 2018 - anything I’m missing?

Any experiences using these or comparing/contrasting the approaches?

Joshua_Bowles · December 31, 2018, 12:52am

No experience at all.
Seems PPL is an area where Julia could really shine. I’m not familiar with the field, only with certain attempts like Anglican, pyro, Stan. Field seems still very young?
I’d love to see a good book on PPL using Julia. Nobody happens to writing one are they?

cscherrer · December 31, 2018, 2:12am

I guess that depends how you measure it. I’m not sure when the original BUGS was developed, but there was certainly some significant work in the field in the 90s. Martyn Plummer’s JAGS paper was 2003, and Church was published in 2008. The field recently got a big push from a DARPA program called PPAML (“Probabilistic Programming for Advancing Machine Learning”) that finished up just a couple of years ago.

Me too! Most things I’ve seen use Stan, and there’s some great material with PyMC3 as well. @goedman has mentioned McElreath’s “Statistical Rethinking” book recently on the Julia Slack channel, I think he’s looking into porting some things over from that.

In my experience, this sort of book tends to be more successful when the PPL used is a means to and end (namely, understanding Bayesian stats) as opposed to and end in itself. But my experience could well be a very biased sample

Joshua_Bowles · December 31, 2018, 2:53am

Church, haha yeah. some of my long-ago nlp research was based on Church (noah goodman went on to uber and release pyro). I also remember checking the DARPA projects often for a period.
I guess my history goes back more than I realize… i guess perhaps “young” is wrong; maybe more like not as popular as deep learning

But I’ve never done any significant work in this space, nor have I worked in it consistently… and I guess a lot of that is due to finding good intro-level material. ugh, like many things it seems dive into intro material with python and transfer to julia

Last I check Turing.jl was pretty light on examples/docs (I did see the juliacon2018 talk!). Also I hadn’t heard of Soss.jl (nor many others in the list you compiled) until you mentioned it on another discourse thread yesterday. I’m going to look at Soss.jl and again at Turing.jl and see If i can’t get something fun out of it.

cscherrer · December 31, 2018, 3:10am

Cool! Yeah I think there’s huge potential in Pyro’s approach - planning on doing the model/guide thing in Soss at some point.

Haha yeah but I think it’s getting there. And I love that there’s a lot of work involving both at once.

I had looked at Turing a bit in late 2017 and not again until recently. They’ve really been making some great progress! Besides, I think the big thing is to understand some of the concepts. There’s tons of material around for PyMC3. It’s pretty mature, and the interface is light-weight. I’d have a look at some of those materials for the concepts, maybe see about translating them into Soss or Turing.

Tamas_Papp · December 31, 2018, 9:57am

I have used Stan.jl, then Mamba.jl briefly, and found them nice for a PPL.

Then I ran into a difficult models, for which, among other things, I

had to use some non-standard transformations (and adjust the log likelihood with the Jacobian determinant accordingly),
could save a lot of time using sufficient statistics,
could improve ESS a lot by conditioning the transformations/model structure on the data (think, eg, centered/non-centered parametrizations in a hierarchical model depending on group size, different for each group).

For 1 and 3, things got so complicated that I preferred to unit test the pieces, which I found difficult in a PPL. Also, benchmarking and profiling the model by small pieces was a lot of help in finding bottlenecks (think of likelihood evaluations that cost 1–2s with a gradient, even when coded manually, this adds up a lot).

This experience made me very skeptical of PPLs. Coding up the model as essentially a loglikelihood calculation is not the major pain for me when doing Bayesian inference, and PPLs make it a bit simpler, at the cost of making a lot of other things complicated.

For me, the quality of the backend doing the inference matters much more for nontrivial (\ge 5000 parameters) models than the surface syntax coding the model, which is essentially a fancy way of writing the likelihood. I spend much more time figuring out why I am getting bad mixing or slow sampling than thinking about likelihoods per se.

The ideal interface I am striving for these days is an API composed of Julia code that facilitates coding models. When I run into problems, this allows me to deal with them the same way as other Julia programs, using the extended set of tools kind people made available (eg ProfileView.jl and Traceur.jl, when I need to go beyond @code_typewarn).

This is very subjective, but I consider PPLs as a DSL as a way of bringing back the two-language problem. I think that this had a rationale in languages less powerful than Julia, eg Stan is great because R can’t cut it in speed and no one wants to program C++ if they can avoid it. But Julia’s powerful low-cost abstractions motivate the exploration of doing without PPLs for me.

cscherrer · December 31, 2018, 5:49pm

Tamas_Papp:

Then I ran into a difficult models, for which, among other things, I

had to use some non-standard transformations (and adjust the log likelihood with the Jacobian determinant accordingly),

could save a lot of time using sufficient statistics,

could improve ESS a lot by conditioning the transformations/model structure on the data (think, eg, centered/non-centered parametrizations in a hierarchical model depending on group size, different for each group).

For 1 and 3, things got so complicated that I preferred to unit test the pieces, which I found difficult in a PPL. Also, benchmarking and profiling the model by small pieces was a lot of help in finding bottlenecks (think of likelihood evaluations that cost 1–2s with a gradient, even when coded manually, this adds up a lot).

(1) can be improved with reparameterized or statically-optimized distributions. The simplest example of this is a log-normal distribution. In principle we could work in terms of the pdf of this directly, but it happens that we can get the same effect by exponentiating a normal. There’s no reason this couldn’t be done more generally, for example working in terms of the logit of a beta distribution instead of the raw value. Perhaps a distribution could be expressed as the composition of “pre” and “post” functions, yielding opportunities for fusing these in cases where we really need the raw value.

(2) becomes a lot better if we work in terms of exponential families. This is exactly the case where sufficient statistics have a fixed dimensionality, independent of sample size. They also have a very convenient form for the logpdf. A PPL should, for example be able to statically refactor iid observations on a Bernoulli into a single observation on a Binomial.

(3) is already in the plans for Soss, at least the center/noncenter thing. I think Rao-Blackwellization should be doable statically as well.

Me too, in many cases. Stan is great but actually leads to a three-language problem (Stan + Python/R/Julia + C++). Plus, Stan is focused around cases where a “model” is a function (the log-posterior) and all inference is done in those terms. This is a perfect fit for a huge number of modeling problems, but there are certainly lots of things it can’t (and doesn’t intend to) do.

My subjective view is subjectively close to yours. The model, data manipulation, and the PPL implementation should all be in the same language. And it should be a monotonic improvement over writing it yourself - a single case of “this would be easier by hand” is one too many.

Tamas_Papp · December 31, 2018, 6:32pm

Regarding (1): eg the model I am currently working on is an economic model, where the structure of each the agent’s problem selects a parametric subset of A(\alpha, \beta, \kappa) \subset \mathbb{R}^7 conditional on observed outcomes. I won’t go into details here, but even mapping it to \mathbb{R}^7 was quite tricky. The benefit is that working on the unconstrained \mathbb{R}^n really helps HMC, then downside is of course calculating the Jacobian

cscherrer · December 31, 2018, 6:47pm

Wow, that does sound tricky; distributions with data-dependent supports can be a real mess.

Topic		Replies	Views
PPL collaboration Probabilistic Programming	33	5264	April 8, 2020
Julia PPL survey paper? Probabilistic Programming	15	2549	September 22, 2020
Current state of Julia Probabilistic Programming Languages and functionalities General Usage	9	2457	April 22, 2022
Keep it simple stupid vs PPLs like Turing Probabilistic Programming question	5	344	June 2, 2025
JuliaCon 2019 talks Probabilistic Programming juliacon	4	1050	August 6, 2019

Probabilistic programming repositories

Related topics