Just as a follow up, we now implemented the RAM sampler in a new package: RobustAdaptiveMetropolisSampler.jl. Feedback welcome, registration is pending!
Iām somewhat curious about a bit broader landscape of bayesian inference algorithms than MCMC. We had and have more success for some sorts of problems using nested sampling methods, compared to MCMC. An example of a nested sampler is PolyChord, for which I made a thin wrapper in Julia, and it works pretty smoothly without significant overhead - unlike Python. However, I couldnāt find basically anything regarding Julia implementations of such algorithms yet. Are there any particular reasons why everyone in this community tends to prefer MCMC?
I have yet to do anything with MCMC in Julia, but hereās my guess.
First of all, I donāt think the Julia community is disproportionally attached to MCMCā¦but MCMC (and now HMC) is by far, one of the more common methods in the Bayesian community. Since the Julia community is a bit smaller than, say, R, I would guess Julia doesnāt have as many āfringeā methods being published just yet.
I will also say that HMC is particularly attractive for Julia. The reason for this this is that things like MH allows you to build a sampler that you can generically attach to any model that a user provides, which is pretty powerful. HMC is a great algorithm, but it requires derivatives, which means I canāt just hand you a black-box posterior density function and start taking samples like I can with MH. This where Stan comes in; in many ways, their āmarket advantageā is that they have a way of taking in BUGS models and using C++ AD libraries for extracting efficient derivative calculations. This allows them to take generic BUGS models and attach HMC to it.
But thereās been some work in Julia thatās changing that game. In particular, packages like ForwardDiff allow you to take a blackbox posterior density function written in Julia, analyze the code and create a function that analytically computes the derivatives. To the me, it is fundamentally amazing how much this simplifies the whole work flow for this. Suddenly, HMC is generically available to the masses!
I think that compared to languages like C++ and R, developers working on MCMC and Bayesian inference are probably overrepresented in the Julia community, because it is such a great language for prototyping.
Since the ecosystem is not that mature yet, being a user of MCMC in Julia takes a bit more effort compared to, say, Python or R, but it is definitely feasible at this point.
My question wasnāt about the fraction of developers working on Bayesian analysis vs all developers, but about developers working on Bayesian analysis vs those doing MCMC. It looks like for Julia all Bayesian analysis-related work boils down to MCMC, e.g. the only project on Nested Sampling I found had no updates for 5 years.
I think @Tamas_Papp was making a broader point about the Bayesian ecosystem. @pistacliffcho hit your question on the head:
Iād agree that
- MCMC is by far the most common method for Bayesian analysis, so thatās just what you tend to see in Julia, and
- Nested sampling is a bit āfringeā, as are many other analysis methods.
Some of the probability people in Julia are starting to branch out a bit now ā I know that us Turing folks are moving into variational inference and we have our eye on having an inclusive platform for inference methods. Perhaps one day weāll support nested sampling, though I donāt know what thatād take. If youāre willing to build it we can maybe have a conversation about what infrastructure we can provide.
I see, thanks for the perspective! Personally Iām more of a user for Bayesian methods and donāt develop this kind of algorithms myself. What could be useful even without implementing other methods like nested sampling in julia, is integrating already existing samplers in other languages to allow for easier model specification.
Recently I looked at Turing and other similar Julia packages, but couldnāt find an easy way to integrate e.g. the polychord nested sampler I mention above. It basically takes two functions: prior
- the map from a uniformly distributed value in a hypercube [0..1]^n to a vector of n model parameters, i.e. inverse CDF of the prior; and loglikelihood
, which is just the log-likelihood defined on n-length vectors. So, I didnāt see how to get these functions from a model defined using e.g. Turing.
We are currently working on unifying the API so that you can get this with a function call. For now you can implement an inference algorithm by implementing an assume function (which allows you to get prior draws) and an observe function (which is used to compute the log pdf). You can have a look at the IS we implemented for a reference point.
https://github.com/TuringLang/Turing.jl/blob/master/src/inference/is.jl
I donāt know much about the properties of nested sampling, so my remark is more about the āother methodsā.
There are probably thousands of Bayesian inference algorithms out there, even if we count only variations of Metropolis-Hastings. Some are approximate, some are asymptotic like MCMC, some are for some very specialized contexts. Frequently, the author demonstrates the efficiency of the algorithm on one or two problems, and very few papers explain when the new algorithm should be superior to existing ones, and even fewer can explain why this happens.
Experimenting with new algorithms is of course an interesting research area. But for practical applications, I think it is much better to just have a just a few algorithms that
- are theoretically sound and reasonably well-understood (eg NUTS)
- well-tested (implementing the more complex algorithms can be tricky)
- have relatively known failure modes and diagnostics for these (eg EBFMI for NUTS)
I think a library should go ham. For example, DifferentialEquations.jl implements every algorithm it feasibly can, and adds more every day, though the recommended methods are quite stable. I would like to see Turing.jl have a few thousand methods if thatās what exists in the literature, though have a method for choosing a default and a section recommended how to parse through the list. I think having this would be a nice feature because it would make Turing.jl both a user platform and a research platform which would make it easier to maintain, since it would pull the research community in.
Related source: https://www.sciencedirect.com/science/article/pii/S0965997818310251?dgcid=author
I think itās a good goal, and itās a target for us. The big issue with inference is just what Tamas pointed out:
Building a common interface that generically supports data in ā parameters out can be tough, but I think itās worth doing because every method has some R package that doesnāt talk well to all the other R packages, some of which have incredibly bad design.
DiffEq is a great model of how to do this correctly in a unified way. My understanding is that itās got a fairly good development skeleton, which I think is what Turing needs to be able to write up an inference method. The documentation and learning machine you guys have going is amazing as well.
Having an extensible but unified API is perfect for research. But when it comes to practical applications, a lot of the less robust MCMC methods just give incorrect results for finite samples in ways that are very hard to diagnose (for the level of effort most users are willing to apply).
NUTS is considered a more or less āturnkeyā method, but there are quite few other ones with comparable robustness. Eg ADVI can have a drastic bias, MH variants will cheerfully explore regions pretty much irrelevant for the typical set, etc (and yes, NUTS can fail very nicely, too, ).
I think itās fine to give someone a footgun and then explicitly donāt recommend it.
I think this is not so easy for Bayesian inference methods. Itās good to have an API that is easily extendable as the one @cpfiffer is working on for Turing. But more importantly you want to have a base set of robust and efficient inference algorithms, like we aim for in Turing, that can be maintained by the core developers. Only having a bunch of quickly implemented inference algorithms will not be very useful, in part because of the reasons pointed out by @Tamas_Papp.
But I have to agree that for research on inference algorithms it is good to have a proper API. For Turing we mainly aim to have a usable and robust PPL with well tested inference algorithms.
I have some research ideas on inference that I want to test using Turing. But but for now we need to have a robust core so that it can be used by everyone. I want Turing to be a trustworthy PPL and not only an API for inference algorithms.
Unfortunately, this (assume
and observe
functions with the current signature) seems not general enough to cover at least some (?) nested sampling methods - e.g., as I say in the previous post:
As for advantages compared to MCMC - the main one we generally see is a much better behaviour for multimodal distributions with separated modes, for a reasonably small number of dimensions (like 20). I havenāt tried Julia MCMC implementations yet, but NUTS as implemented in pymc3 sometimes starts going extremely slowly, but nested sampling give reasonable and reproducible results.
You seem to be a big fan of nested sampling. You can always just revive NestedSampling.jl and then improve it further if you want. We can then help you wrap it in Turing.jl or other PPLs. Both Turing.jl and NestedSampling.jl are open source and MIT licensed, so extra hands are always welcome
I also donāt see a problem with doing this (basically, everyone can write what they like), I am just wondering if there is any benefit to the community from doing so.
Niche inference algorithms would either hook into a common API package, or be part of a bigger package. In practice, code either ends up being useful for at least a small number of people who are willing to maintain it, or succumb to bit rot in a few years, so either the API would go out of sync with the package, or the package would have code which is practically understood only by the original author, who has moved on since.
I donāt see why the assume and observe functions should not be sufficient. You should be able to do exactly what you try to do with it. If I understand correctly you need to implement the mapping function and draw from a hyper cube. Then set the parameter values using a VarInfo type and simply set the value inside the assume functions with log(1) as the log prior probability. The observe is as in IS. Similar to what happens for HMC related methods.
But Iāll have to look into the paper to understand the details.
Building a PPL around assume and observe functions is nothing new. This has been successfully done in other frameworks that support various inference algorithms. However, I agree that the API is not particularly intuitive which is why we work on improving it.
We are happy to support you if you are interested in implementing nested sampling as a contributed sampler in Turing.
(Ups, I just realised Mohamed already offered our help.)