MCMC landscape

We have been using Klara.jl for some of our projects, and are looking for alternatives (due to that package no longer being maintained).

My sense is that Mamba.jl, Turing.jl and DynamicHMC.jl are the actively maintained options these days. Did I miss something?

I’m really looking for something where I can just provide a callback for the log-density target function, pass that to a sampler and get a chain back. I am in particular not interested in fancy modeling environments, graphical representations of the problem or anything like that. From that point of view, the API of DynamicHMC.jl looks great. But, it seems to only support one sampling algorithm, and in particular one where we need to provide gradients, which we don’t have.

The sampler that we used from Klara.jl in the past was the RAM sampler. So in my ideal world, we would use a package with an API similar to DynamicHMC.jl that allows us to use that sampler. Does that exist?

In general, are there any efforts to provide robust, solid implementations of just the different samplers that exist in the literature? Maybe in such a way that they could be re-used by the higher level modeling packages, but could also be used directly by folks that are only interested in the samplers, not the modeling environments?

10 Likes

This is Turing’s backend mcmc package https://github.com/TuringLang/AdvancedHMC.jl

Also there’s Gen.jl

1 Like

Yes. Currently, we (Turing) are working on a general sampling interface to various samplers, which we hope can abstract from Turing’s core modeling functionality. See #746 for info on the interface, and #634 for work on separating the samplers from the modelling functionality.

You’ve hit on a key issue we’re trying to address, which is that some people probably really don’t need all of our functionality, just the sampling component, or maybe they want the modeling component and want to provide their own sampler. We recognize this, and we’re spending a lot of time trying to make this much more “plug-and-play” for the whole Julia ecosystem.

9 Likes

@cpfiffer That sounds awesome, essentially exactly what I’m looking for! Is there a timeline? Just a rough idea when the sampler side of this might be available?

Apart from AdvancedHMC.jl, there is also AdvancedSMC (mainly containing SMC and particle Gibbs) which is currently a sub-module of Turing.jl. We will probably pull AdvancedSMC out of Turing.jl and make it an independent package once the interface issues @cpfiffer mentioned are addressed.

Regarding timeline, I guess realistically we will not be able to finish all the work on this before JuliaCon but we are trying to have this done ASAP.

AdvancedHMC.jl can already be used independently. In the future, there will probably also be an AdvancedVI package providing generic variational Bayes implementations. But this is work in progress.

1 Like

Any plans to also provide a simple MH sampler for this interface? And maybe this RAM sampler I linked to above?

I think it is great that there is so much advanced work going on in this space, but at the same time it would be great if it was simple to just write a log likelihood, pass it to a simple MH sampler and get some results.

The MH sampler exists already as part of Turing’s core, and it’ll be incorporated as part of the interface refactoring. RAM probably won’t be implemented by the core team (at least for now), as we’re hoping to get more community involvement in sampler development.

Yeah, I’d agree – I don’t actually really know how you’d implement this using our planned extensions, perhaps @yebai might have a better vision as to how simple likelihood function passing might work. Ultimately Turing’s models are just functions, so I suspect it’s entirely possible to do something like this, but I think I’m the wrong person to say how complex it might be for us to get this working performantly.

Yes, I can confirm this.

If your model is AD’able, you can get gradients using

using ForwardDiff, ReverseDiff, Flux, or Zygote (ordered from robustness to speed). There are examples in

Also, I would recommend looking at

where @goedman implemented a lot of models very nicely using CmdStan, DynamicHMC, TuringLang, and Mamba.

Note that MH is trivial to implement, but for nontrivial models, tricky to tune, and then orders of magnitude less efficient than HMC methods. Unless getting gradients is actually impossible after all tricks are exchausted (eg integrating out discrete parameters), I would recommend that you invest in using a NUTS/HMC variant, with either of the frameworks above.

1 Like

It is not.

While trivial to implement, I might make a mistake (much easier to go unnoticed in one-off code), it won’t have tests if I do it quickly myself, I won’t have any diagnostics etc. etc. I think having a simple, robust, tested MH implementation available as a package would have many benefits.

Apart from the fact that we can’t use AD in our situation, we know that a sampler like RAM works well for our use-case. We have everything running with Klara.jl on 0.6, but we need to move things over to 1.0, and ideally we don’t want to revisit things like algorithmic choice. Even if things could be done more efficiently, we don’t need that at this point.

3 Likes

There is also AffineInvariantMCMC.jl, which has the log density function to chain behavior that you seek.

4 Likes

DiffEqBayes.jl is a library that lets people use Bayesian inference on ODEs without knowledge of those libraries.

https://github.com/JuliaDiffEq/DiffEqBayes.jl

What we’ve found so far is that Turing’s NUTS is the most stable, even more stable than the auto-generated Stan kernels. DynamicHMC diverges for us sometimes. Turing is much faster than it used to be. These tests show that it’s the thing to use, at least for now.

3 Likes

Soss.jl aims to provide a clean way to manipulate and combine models. While Turing gives more attention to the flexibility needed for “universal” probabilistic programming, I’m focusing on models with more structure: dimensionality (and supports, at least for now) should be known at statically at inference time.

Given this, Soss can do things like

  • Condition on a variable (yielding a different model)
  • Find the prior or posterior predictive distribution, or the likelihood (all models)
  • Evaluate the log density
  • “Forward sample” from a model
  • Sample from the posterior (currently using @Tamas_Papp’s DynamicHMC.jl)
  • “Particle importance sampling” that can be used for variational inference, using @baggepinnen’s MonteCarloMeasurements.jl
  • Progress toward symbolic simplification using SymPy.jl, arriving at a for of the log density that can be evaluated more efficiently
  • Model dependencies, using @scheinerman’s SimplePosets.jl

Generally “inference” in Soss means generating source code specialized for a given model. But it’s all “normal” Julia code. This has some tradeoffs, but inspecting the generated code is very easy. One of my hopes is that this will be good for building code a user can hand-tune (if they want), and that the approach might also be helpful pedagogically.

Benefits:

  • The approach is relatively simple
  • Because a Model is very close to the user-specified code, inference is extremely flexible
  • It can already do some useful things
  • In principle, inference can be very fast
  • Very open to collaboration

Drawbacks:

  • Code generation is currently via @eval and invokelatest. Still hoping to find better alternatives for this.
  • It’s still in early stages, and is currently at the tail end of a complete refactor (the refactor branch)
  • Development has been in fits and starts. Hoping to find ways to improve this and move forward more consistently
  • The recent refactoring pass was nearly a “scrap it and start over”, kind of brutal and certainly uglied-up my git logs

I’m sure really selling it here :wink:

4 Likes

Oh, and if you’re after performance, @Elrod’s ProbabilityModels.jl is looking pretty great. Hoping to find ways to connect with his work soon as well :slight_smile:

1 Like

The RAM algorithm looks like a relatively straightforward extension of MH. Perhaps take a look at how MH is implemented in Turing https://github.com/TuringLang/Turing.jl/blob/master/src/inference/mh.jl, and make something similar for RAM? There is a fair bit of boilerplate in there, but hopefully @cpfiffer’s PR will make this much more readable. If you can wait until JuliaCon, this PR may be merged by then. You can also ask questions on the #turing slack channel or to Cameron directly in JuliaCon :slight_smile:

1 Like

I agree. Here’s the algorithm, from https://arxiv.org/pdf/1011.4381.pdf

All it requires is a density function on \mathbb{R}^n. There’s no reason for this to be specialized to a given library. Making it available as a standalone function would allow anyone to use it, and then @davidanthoff could have a choice of front-ends :slight_smile:

1 Like

This is all incredibly helpful, thanks everyone!

It seems to me that at one level we need the equivalent of https://github.com/JuliaOpt/MathProgBase.jl for MCMC samplers, right? And then different modeling environments (Turing.jl, Mamba.jl etc.) could all target that. And maybe one additional, new “modelling” package would be one that simply has an API where I can pass a log target callback function.

Having said that, a standalone package with just the RAM sampler would probably help us in the short term already. Heck, one could probably extract it somehow from https://github.com/JuliaStats/Klara.jl/blob/master/src/samplers/RAM.jl

2 Likes

All the interface needed for this kind of sampler is a mapping from the location to the value. LogDensityProblems adds some frills by adding a way to query the dimension.

IMO one could simply implement RAM to take a function, a starting point, and return a bunch of vectors. Not much value is added by some unified interface.

The two key statistics you want to check about mixing are \hat{R} and effective sample size. I could find neither in the RAM article linked above, so I am not sure it is a big improvement over naively tuned, jittered MH.

There are a couple of other reference implemetations as well. Here’s one in R, and another in MATLAB, from the author of the original paper

Turing.jl has that and a ton of samplers. You can even use DyanmicHMC as the back end for Turing.

2 Likes

Alright, I looked around a bit more, and I really like the Turing.jl approach to samplers etc.!

So I think the one thing we would need to make all of this usable for us is an API like this:

function our_log_target(x)
   # Run whatever we need to run
  return res
end

chain = sample(MH(), our_log_target, initial_x = [1., 3., 5.])

# or 

chain = sample(RAM(), our_log_target, initial_x = [1., 3., 5.])

I looked through the Turing.jl code, but it didn’t seem like that could be easily done, or at least I didn’t see how :slight_smile: