Finally, would you be interested in joining forces?

There’s some good discussion here, but it drifted a bit from anything specific to that library, so it seems worth continuing here instead. This may also give it more visibility to other Julia users interested in PPLs.

Thanks for starting this discussion. I think “joining forces” is a great goal to seek, but at the end of the day everyone may have slightly different priorities, preferences in their approaches, and time constraints, so that goal may be impossible to achieve in its idealistic form. For instance, reading a new PPL package and learning its ins and outs can take days if not weeks to fully understand, and even then most of it may not be directly useful. This time and effort may be directed at the immediate development goals of each package instead with guaranteed benefits.

That said, collaboration can take the form of learning from each others’ approaches, and re-using parts of each others’ frameworks to achieve our different goals without necessarily getting behind a single steering wheel. I think this can come from just elaborating on our approaches in more specific contexts. For example, how do you trace random variables during sampling? How do you lower the ~ notation? How do you handle missing data? How do you perform static analysis in Gibbs sampling? How do you pre-allocate? etc. Each of the packages may be taking a slightly different approach to this which is great because it enables us to learn from each other if we are willing to.

I am not quite sure how this thread will evolve but it’s nice to see PPL people trying to collaborate We will probably need to be more specific in our discussions though to actually take this somewhere useful. If and when that happens, GitHub may be a better place for the technical details. Looking forward to the others’ responses!

I might be off topic here but to me Gen.jl shows the biggest promise as of now. Could that be the platform to start from? Or which libraries are otherwise in play? I’m building a Bayesian deep learning package in my spare time and would love for it to stand on top of a proper UPPL. I would also happily contribute to this. I was previously very much into Turing.jl but it seems like it’s quite slow so far compared to Stan and Pyro. Either way, I’m positive and want to help if I can.

FWIW, Turing is now orders of magnitude faster than it was one year ago. Even in the following simple example, the speed difference is very noticeable

using Turing
@model gdemo(x) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0, sqrt(s))
for i in eachindex(x)
x[i] ~ Normal(m, sqrt(s))
end
end
sample(gdemo(rand(100)), NUTS(10000, 0.65))
# v0.6.0
## 175.1 s
# master
## 3.3 s

We also started working on ideas how to utilise the GPU in Turing. Speed and robustness improvements are pretty much highest priority in Turing at the moment.

Besides this, it would be nice if people working on PPLs could exchange ideas and learn from each other more. We try to break Turing into individual components (packages) so that other people can easily reuse parts. Similar to DynamicHMC but slightly more general. I think this would be a good direction. This way everyone can implement his DSL interpreter of choice but can reuse samplers and other inference algorithms. On the other hand, other people including myself can implement inference algorithms without having to care too much about the DSL or how to efficiently carry data around.

I know it’s a lot to ask for but tutorials for Turing.jlà laPymc3 would be pretty awesome for anyone wanting to start contributing and mainly in order to learn the package.

First, there are a few organized package collections like JuliaOpt or JuliaDiff. Should there be a JuliaPPL?

Second, how about a sort of “PPL Rosetta Stone” repository? We could maybe have a collection of models implemented in various PPLs and inference methods. This could make it easy to compare code styles, recognize benefits and limitations across systems, etc.

An additional option for this last possibility is to implement some automated benchmarking. This could be especially helpful for situations two PPLs using the same back-end, since it could make it easy to identify opportunities for optimization.

We could even include the trivial PPL (writing it by hand), which could be really helpful for understanding the expressiveness/performance tradeoffs.

Both are conceptually different from each other. Gen.jl focuses on programmable inference while Turing.jl focuses on compositional inference and universal probabilistic programming. Additionally, they also provide different sets of inference algorithms. Turing provides a range of robust and maintained MCMC sampler as well as variational inference while Gen allows for multivariate proposal distributions, e.g. using a generative NN, to be used in SMC based algorithms and is more low level in terms of use. Because Turing focuses on universal probabilistic programming, you can have stochastic control flow, varying number of parameters, use discrete random measures and compose models. I think Gen supports a small subset of the mentioned but allows to compile static models for faster inference.

Depending on the use case, one might be more applicable than the other.

Could you also tell the difference between programmable inference, compositional inference and universal probabilistic programming, please?
I have just used stan for a while and have read about INLA. I’m new to probabilistic programming.
What things could be done with some of them and not with the other?
Which one is supposed to be faster to run? Which one easier?

I would argue that a PPL based on programmable inference allows you to more easily implement problem specific inference algorithms. It’s usually more low level, like Gen, and requires more knowledge about Bayesian inference algorithms and PPLs in general.

Compositional inference aims to combine inference algorithms, such that you can use, e.g. HMC, for some set of variables and another inference algorithms for the rest. This is useful for more complex models where you don’t want to write tailored inference algorithms but need the flexibility to combine different algorithms for efficient inference.

Universal probabilistic programming refers to inference in probabilistic programs that contain stochastic control, varying number of parameters and so on. Lots of the more efficient inference algorithms have strong restrictions on the models they can be used for and are not universal, e.g. HMC and VI cannot handle stochastic control flow. For the same reason, you are not able to represent every model that you can write in Turing using a static graph (and need special care for variable name handling) making it impossible to use certain graph based optimisations or inference algorithms. Turing solves that by providing inference algorithms, core routines and data structures that can handle those difficult models.

Which one is faster? This is difficult to say. If you use the same inference algorithms you likely won’t feel any difference. But there are no reliable benchmarks between both. The HMC inference in Turing is comparable to Stan in terms of speed and effectiveness. Don’t know about Gen, but it should be easy to use Turings HMC in Gen if necessary. But because you can more easily implement tailored inference algorithms you might be able to get faster inference for specific models using Gen.

What is easier to use? For general purpose Turing is much easier to use as this is the target audiences. If you want to implement tailored inference algorithms Gen is easier as it is meant for this purpose.

Note that because both PPLs are pure Julia implementations they have the luxuries property of being able to easily use any Julia library in the model or inference algorithm. Meaning you can use neural networks and GPUs in both and so on. This is different for Stan, PyMC3 and other PPLs that cannot easily leveraged other libraries.

In case you don’t need a PPL but only want to perform gradient based inference with HMC. You can use AdvancedHMC (the sampler used by Turing) or DynamicHMC (another efficient HMC implementation) directly on your model log joint.