PPL collaboration

@mohamed82008 opened this issue on @Elrod’s ProbabilityModels.jl library, closing with

Finally, would you be interested in joining forces?

There’s some good discussion here, but it drifted a bit from anything specific to that library, so it seems worth continuing here instead. This may also give it more visibility to other Julia users interested in PPLs.

Tagging previously involved/mentioned @trappmartin @datnamer @cpfiffer @willtebbutt @Kai_Xu @Marco_Cusumano-Towne @yebai

7 Likes

Hi!

Thanks for starting this discussion. I think “joining forces” is a great goal to seek, but at the end of the day everyone may have slightly different priorities, preferences in their approaches, and time constraints, so that goal may be impossible to achieve in its idealistic form. For instance, reading a new PPL package and learning its ins and outs can take days if not weeks to fully understand, and even then most of it may not be directly useful. This time and effort may be directed at the immediate development goals of each package instead with guaranteed benefits.

That said, collaboration can take the form of learning from each others’ approaches, and re-using parts of each others’ frameworks to achieve our different goals without necessarily getting behind a single steering wheel. I think this can come from just elaborating on our approaches in more specific contexts. For example, how do you trace random variables during sampling? How do you lower the ~ notation? How do you handle missing data? How do you perform static analysis in Gibbs sampling? How do you pre-allocate? etc. Each of the packages may be taking a slightly different approach to this which is great because it enables us to learn from each other if we are willing to.

I am not quite sure how this thread will evolve but it’s nice to see PPL people trying to collaborate :slight_smile: We will probably need to be more specific in our discussions though to actually take this somewhere useful. If and when that happens, GitHub may be a better place for the technical details. Looking forward to the others’ responses!

1 Like

I might be off topic here but to me Gen.jl shows the biggest promise as of now. Could that be the platform to start from? Or which libraries are otherwise in play? I’m building a Bayesian deep learning package in my spare time and would love for it to stand on top of a proper UPPL. I would also happily contribute to this. I was previously very much into Turing.jl but it seems like it’s quite slow so far compared to Stan and Pyro. Either way, I’m positive and want to help if I can. :blush:

FWIW, Turing is now orders of magnitude faster than it was one year ago. Even in the following simple example, the speed difference is very noticeable :slight_smile:

using Turing

@model gdemo(x) = begin
    s ~ InverseGamma(2,3)
    m ~ Normal(0, sqrt(s))
    for i in eachindex(x)
        x[i] ~ Normal(m, sqrt(s))
    end
end
sample(gdemo(rand(100)), NUTS(10000, 0.65))

# v0.6.0
## 175.1 s

# master
## 3.3 s
3 Likes

Wow that’s impressive! I should perhaps revisit and eat some of my words. :slight_smile:

No worries! It seems quite a few people have that impression about Turing, but we hope it changes soon!

3 Likes

We also started working on ideas how to utilise the GPU in Turing. Speed and robustness improvements are pretty much highest priority in Turing at the moment.

Besides this, it would be nice if people working on PPLs could exchange ideas and learn from each other more. We try to break Turing into individual components (packages) so that other people can easily reuse parts. Similar to DynamicHMC but slightly more general. I think this would be a good direction. This way everyone can implement his DSL interpreter of choice but can reuse samplers and other inference algorithms. On the other hand, other people including myself can implement inference algorithms without having to care too much about the DSL or how to efficiently carry data around.

8 Likes

I know it’s a lot to ask for but tutorials for Turing.jl à la Pymc3 would be pretty awesome for anyone wanting to start contributing and mainly in order to learn the package.

Turing actually has a lot of tutorials. Check https://turing.ml/tutorials/.

2 Likes

Thank you ! I wasn’t aware of those. My applications are mostly in time series but I’ll keep a close look in the upcoming months.

I may be demonstrating GARCH or stochastic volatility estimation during my talk at JuliaCon, so stay posted.

2 Likes

Sorry in advance for the double post, but I had some more resources on this:

One of our JSoC students (@Saumya_Shah) mplemented MA and AR processes in Turing, you can check those out as well.

2 Likes

I just thought of a couple of possibilities…

First, there are a few organized package collections like JuliaOpt or JuliaDiff. Should there be a JuliaPPL?

Second, how about a sort of “PPL Rosetta Stone” repository? We could maybe have a collection of models implemented in various PPLs and inference methods. This could make it easy to compare code styles, recognize benefits and limitations across systems, etc.

An additional option for this last possibility is to implement some automated benchmarking. This could be especially helpful for situations two PPLs using the same back-end, since it could make it easy to identify opportunities for optimization.

We could even include the trivial PPL (writing it by hand), which could be really helpful for understanding the expressiveness/performance tradeoffs.

5 Likes

Good stuff, thanks for the link and I’m looking forward to your talk at JuliaCon !