MCMC landscape

Thanks for suggesting to help! I’m definitely not an expert in inference methods myself, so not going to implement them in Julia. What I’m possibly able to do is integrating already existing libraries in other languages (e.g. PolyChord I mentioned above) to take advantage of the Turing infrastructure of defining models.
Looking at the importance sampling implementation as the simplest reference I still don’t completely grasp how samplers should work in Turing. For example, the polychord library takes two separate functions as the model specification, and I get confused how to get the first one already. The prior function should take N values - the number of scalar parameters in the model - and return the same number of values converted from the unit cube to the model prior distribution. As I understand Turing interface, I would need to run the Model object once and set up assume method to gather prior distributions for all variables in one array. Then the prior function I give to the library just calls invcdf for all distributions in that array. Does that sound at least generally the right way to go?

So the idea behind the assume and the observe statements is as follows.

Assume is called for every parameter of the model and allows you to specify what happens for model parameters. In the case of importance sampling, we sample the parameter value from the prior distribution.

See:
https://github.com/TuringLang/Turing.jl/blob/abbc8c011f7a51192378d8681208083b649dc027/src/inference/is.jl#L60-L64

In the case of HMC, we retrieve the new parameter value from the internal data structure and evaluate the log pdf of the prior given in the transformed space.
See:
https://github.com/TuringLang/Turing.jl/blob/abbc8c011f7a51192378d8681208083b649dc027/src/inference/hmc.jl#L606-L612

The observe function is used for observations and usually returns the log pdf or log pmf of an observation for the observation model. See:
https://github.com/TuringLang/Turing.jl/blob/abbc8c011f7a51192378d8681208083b649dc027/src/inference/is.jl#L66-L69

As you see in the importance sampling example, this is implemented using multi-dispatch. So to implement an interface to a custom sampler you need to define a new type which sub-types InferenceAlgorithm and a Sampler function which returns a new sampler, see: https://github.com/TuringLang/Turing.jl/blob/abbc8c011f7a51192378d8681208083b649dc027/src/inference/is.jl#L38

Then you can implement the sample function which is called by the user. We will soon have a better API for this.

I’m not 100% sure I understand how to use the PolyChord library (how do you tell the library which prior distributions to use?), but here is an attempt on how you can interface the PolyChord library in your sample function. (Note that we don’t have an API to get the number of parameters as of yet)

vi = VarInfo(model)

# get the parameter values
initial_params = mapreduce(m -> m.vals, vcat, vi.metadata)
# Have a look here: https://github.com/TuringLang/Turing.jl/blob/abbc8c011f7a51192378d8681208083b649dc027/src/core/RandomVariables.jl#L118

# call PolyChord
params = PolyChord.prior(rand(length(initial_params)))

# set the new parameter values
c = 1
for m in vi.metadata
  vn = m.vns # This will likely break cause issues if the user uses the same symbol in different contexts of the same model
  L = length(m.vals)
  vi[vn] = params[c:(c+L-1)]
  c += L
end

Does that make sense to you?
Apologies if I misunderstood you, I should really read the paper first. :sweat_smile:

Well, I guess I just described it in a little bit confusing way. The workflow for polychord library is basically like this:

# `prior` converts from uniform cube to model parameters
# e.g. if we have two parameters, one with Normal(0, 1) prior and the other with Normal(5, 2):
prior(cube::Vector{Float})::Vector{Float} = invcdf.([Normal(0, 1), Normal(5, 2)], cube)

# computes likelihood for given model parameters
loglike(params::Vector{Float})::Float = ...

run_polychord(prior, loglike, options)

It looks like this approach and the one in Turing have “inverted” directions of control flow, in a sense.

I see I think this could be done much easier than in my example. I also had a look at the nested sampling paper and I agree it is a very nice should add this to Turing. Could you open an issue on Turing so that we can discuss the details there?

You can also have a look at https://github.com/TuringLang/Turing.jl/issues/886 and contribute some ideas on how to improve the API from a user perspective.

1 Like

I would like to revive this chat a bit, if possible. I’m coming from Python’s MCMC environment and want to try out Julia for some new projects. I have used the great emcee and dynesty packages for sampling a self-defined loglikelihood function and nested sampling, respectively, and I’m looking for similar options in Julia. So it kind of fits @davidanthoff initial questions and also the later discussion on nested sampling. Are there any news/developements in the meantime?

More specifically, I was wondering how I could use Turing.jl to sample from a user-defined log likelihood function. If there were recent developments in this direction and, in case, if there is a minimal example somewhere?

Any help is really appreciated!

NestedSamplers.jl exists. It’s not yet integrated into Turing, but its relatively well designed in that it could be built into Turing with some effort. I’m also looking into building a Julia/emcee bridge that would allow for the use of emcee in Julia.

I mean, technically all Turing models are “user-defined”. More seriously, if you are asking about how you get a posterior if you have some likelihood function f, you don’t need Turing. You can go upstream to the inference packages behind Turing, like AdvancedHMC or AdvancedMH. You can also use DynamicHMC.

Turing’s main strength is defining the model. The breadth of inference methods and modularity I think are great, but they ultimately pale in comparison to the ability to write super simple models that just work. If you already have a likelihood function and don’t need to write the model specification, you probably don’t need Turing.

3 Likes

To build upon cpfiffer’s response, you can create a custom log likelihood function by extending logpdf and creating a new distribution subtype from Distributions.jl. Here is an example from the Turing documentation.

2 Likes

Thanks to you both and great work! I will take a look at the different options, AdvancedMH might be what I need since my problem is not AD-able (I guess). But I will also try to do it within Turing and get a bit used to this kind of probabilistic programming.

And nice to see that there is also some active development on nested sampling. I saw NestedSampling.jl before, but missed out NestedSamplers.jl so far. Thanks.

Obtaining gradients will have the highest payoff for convergence in a nontrivial application since it allows you to use HMC variant methods, so you may want to explore this first.

1 Like

yes. Maybe you guys can help me. I’m trying to fit mathematical models to data (D), typically stochastic Markov models with discrete states and continuous time (“Gillespie” algorithm). So given a model M and parameters θ, I can simulate correct stochastic realisations from the model/stochastic process (you can think of finance stock market or so). Then I compute a (log)likelihood L = p(D | θ, M) based on the collection of many model realisations and some statistics that apply both to data and model. Then MCMC to get posterior p(θ | D, M). Might automatic differentiation be possible here?

Sounds like you are using indirect inference?

If the simulation results are differentiable in the parameters, there should be no reason you cannot make it work with AD. Just use constant random numbers.

If this is not the case, then frankly, I would transform the model until it is :wink:

1 Like

Might be, right away I cannot say for sure :smiley: My field is quantitative biology, not economics, but concepts might be similar.

Thanks a lot! I will look into this.

Yeah, I recall the many quantitative biology papers I read (as an economist) when learning about this methodology :wink:

You may be interested in

1 Like

All the same it seems, just different data :smiley:
Thanks for the link, I’ll try to process the stuff and see how far I come!

EDIT: There is actually a topic related to my problem (Differentiating through a Jump Problem).

EDIT2: Just for completeness, my current conclusions:

  1. Following the linked topic above, Gillespie simulations with discrete states and discrete random selection of reactions do not seem to be (automatic) differentiable. You could make them continuous as a SDE for AD to work (see linked topic for more info), but that’s not the path I want to take. So I’m using samplers now without the need of a gradient; i.e. AdvancedMH as mentioned by @cpfiffer.
    If there are more non-gradient samplers in Julia, I would be glad to know! On a first glance, the currently implemented MH sampler generally works, but can have low number of effective samples.
  2. Approximate Bayesian (ABC) methods should be applicable for these problems; they do not need the evaluation of a potentially costly likelihood, but are – as the name suggests – only an approximation of the posterior (how good, has to be checked case-specific). I will try them out once I run into runtime limitations. Potential Julia implementations: ApproxBayes, GpABC, ApproximateBayesianComputing, ABC (and maybe more).
2 Likes

I have found that Differential Evolution MCMC works well for non-gradient based models. I implemented a version of DE-MCMC that might go into one of Turing’s library for particle based samplers. There is a folder with examples. It’s not as efficient as NUTS but it’s much better than AdvancedMH (Edit: MH methods in general, not AdvancedMH in particular).

3 Likes

Do you mean MH methods specifically or the implementation of it? I’m always looking to make AdvancedMH better.

Sorry for the confusion. I was referring to MH as a method in general. Your implementation looks good to me. However, if there is something inefficient in your implementation, I probably wouldn’t be the one to identify it :slightly_smiling_face:

Crap, I really needed someone to tell me what’s wrong :wink:.

3 Likes

I just want to add our:

3 Likes

(edit: sorry, this is in reponse to @mlanghinrichs, up above) Some work I have done is a related to this, I think. I estimated a continuous time diffusion model by doing MCMC using the asymptotic Gaussian likelihood of a vector of statistics as the “quasi” likelihood. The mean and the covariance matrix of the Gaussian likelihood at given parameters are estimated by simulations from the model. The dimension of the statistics is reduced using a previously trained neural net. The method uses plain MH MCMC. This works well, contrary to what might be expected, because the theory behind the method leads to a very effective proposal density. At any rate, it is an example of MCMC for a continuous time model, based on statistics rather than the full sample. The example is at https://github.com/mcreel/SNM/tree/master/examples/JD. A working paper referenced at the top directory of the repo gives explanations.

There are also simpler models in the examples. I have found this method to work quite well for all of the examples I have explored. The limitation is the time needed to sample from the model. Because a neural net must be trained, if it’s expensive to sample from the model, then it can be a little time consuming to estimate a model. The jump diffusion model took 2 or 3 days, maybe 4, on an old 32 core machine

1 Like