ANN: DynamicHMC 2.0

Thanks. It would be best if you opened an issue on Github, with version information etc.

BTW, are you sure you are using 2.0.0 or master? There was an issue like this which was fixed by this line. Your line numbers in the stacktrace don’t match up with the current version, but that could be an unrelated thing.

Hey thanks a lot, I was not using the last version I think.
I have now updated to Julia 1.2 and I got the packages from your github repository and now everything works.
Incidentally for some reason I still get the same problem if I run the code in Jupyter notebook instead of in a script…

Thanks. I will read these materials.

To my knowledge, we currently have two high-quality HMC implementations: DynamicHMC.jl and AdvancedHMC.jl. I haven’t used either intensely (but I definitely plan to), so I’m curious as to what the main differences between the two packages are.

4 Likes

This is a good question. AFAIK DynamicHMC.jl predates AdvancedHMC.jl, so it would be best to ask the authors of the latter about what, if anything, they were missing that made them start their own package.

@Kai_Xu, could you provide some information in this regard? I heard good things about both DynamicHMC.jl and AdvancedHMC.jl, and I guess they probably have their individual strengths. From a user perspective, it would be great so have some pointers on which might fit what kind of use cases better.

1 Like

It occurred to me that @goedman (hope you don’t mind me pinging) coded up everything in

for all available options, so perhaps he is the best person to ask about a comparison.

1 Like

Thanks for the ping @oschulz. Here are some of my thoughts.

What’s AdvancedHMC.jl (AHMC) for

AHMC is motivated for two purposes.

  1. Serve as the HMC backend of the Turing probabilistic programming language.

    • It extracted the HMC codes from Turing.jl written during my MPhil in 2016 and becomes a standalone package on its own.
  2. Serve as a research platform for developing novel variants of HMC algorithm

    • We made some efforts so that we can quickly develop new HMC algorithms using AHMC. You might see some novel samplers available in AHMC shortly.

Note that because of these two purposes, the direct use of AHMC requires some basic understanding of HMC algorithms, and might not be suitable for users of all levels. But you can achieve more stuff with it because of its flexibility and GPU support. E.g. you can sample from an energy-based model defined by a neural network using Flux.jl running on GPUs via CuArrays.jl.

See our poster on StanCon 2019 for more details of AHMC in which we also statistically compare our no-U-turn sampler (NUTS) implementation to Stan’s based on MCMCBenchmarks.jl.

The differences between AHMC and DynamicHMC.jl (DHMC)

The main difference is AHMC aims to support a wide range of HMC variants, including both static and dynmiac HMC algorithms, while DHMC, as the name indicates, implments a specific variant of dynamic HMC algorithm: NUTS with mutinomial sampling and generalised no-u-turn criterion; AHMC also supports this variant. Again our poster illustrates the variants that AHMC currently supports.

The Turing language and the differences in modelling pipeline

For end-users who focus on modelling, I highly recommend them to use Turing directly. The default HMC samplers are based on AHMC, but you can choose to use the NUTS implementation by DHMC as well. Plus there are many other inference methods available in Turing.jl, e.g. importance sampling, SMC, particle Gibbs, variational inference, etc. You will need some of those if you are working with discrete data :slight_smile:

PS: A compositional interface that combines different MCMC samplers in a Gibbs algorithm is also available (see our AISTATS paper).

Turing has its modelling language (quite similar to Stan’s) which automatically uses Bijectors.jl (which is another library extracted from Turing.jl) to deal with constraint variables. The use of Turing.jl and Bijectors.jl (roughly) corresponds to the use of LogDensityProblems.jl and TransformVariables.jl.

For analysis, Turing provides MCMCChains.jl that implements a unified and easy way to check a lot of statistics for MCMC as well as plotting useful figures. This is also the default MCMC summary backend of CmdStan.jl.

9 Likes

The only difference I see is AdvancedHMC supporting the slice sampler and the turning criterion of the original NUTS, but since both have been superseded (by the multinomial sampler and the generalized turning criterion), I don’t think this is relevant for most users; as these are less efficient variants, not something that you would ever want to go back to.

Thanks @Tamas_Papp and @Kai_Xu!

Regarding Turing, we’re planning to integrate DynamicHMC and/or AdvancedHMC into BAT.jl, it’s focused on use cases with (often complex) user-written likelihoods. So I’ll need to use the lower level interfaces, and I’m very happy that we have two high-quality (as far as I can judge) standalone HMC packages now that can be used as backends (without pulling in large frameworks that are geared towards specific use cases).

1 Like

I started DynamicHMC.jl specifically for supporting user-coded likelihoods, because the packages that were around that time preferred to expose a higher-level framework with directed acyclic graphs, which are not really suitable for the models I am working with. I find that being able to debug, unit test, benchmark, and optimize my log posterior as a plain vanilla Julia function is really valuable.

If you are using (modern) NUTS, I would say that the sampler should be close to identical between AdvancedHMC.jl or DynamicHMC.jl, so perhaps you should just review the interface and use whatever you find convenient.

For DynamicHMC.jl, the API for defining a log density is documented here in detail:

https://tamaspapp.eu/LogDensityProblems.jl/dev/

You can use this interface to perform AD for you with any of the supported AD packages (and it is easy to change later).

If you want to discuss something or just need a new feature, just open an issue or ping me on this forum.

Yes, that’s one of the things that got my attention, I really like that modular approach - I’m also considering to split off the (IMHO fairly good) MH-sampler in BAT.jl as a separate package at some point. (HMC is obviously superior in most use cases, if you have a gradient, but we often have to deal with likelihoods where a gradient is hard or impossible to obtain, and where good old MH will do well enough since the number of parameters is fairly low).

LogDensityProblems.jl […] You can use this interface to perform AD for you with any of the supported AD packages (and it is easy to change later).

Yes, I know about LogDensityProblems.jl. I actually have a similar thing in BAT.jl (minus AD at the moment), and I’ve been considering to adopt the LogDensityProblems API. Do you expect it to be stable from now on (I know you did some changes recently)? We have been asked to release an API-stable version 1.0 of BAT.jl soon.

The only thing that is missing is allowing log density functions to return an extra payload (which is useful for eg indirect inference, containing simulation results). I did not do it yet because this was a feature request by someone who did not follow up on it, but I see how it can be useful and I may end up doing it in the long run if there is interest.

In any case, I can refactor the API to allow these kinds of extensions, and then it would be stable.

Note that for widely used packages of mine, I generally coordinate with all dependent packages, eg

so if I ever change the interface, there will be pre-release discussion, a major bump in SemVer, etc.

The only difference I see is AdvancedHMC supporting the slice sampler and the turning criterion of the original NUTS, but since both have been superseded (by the multinomial sampler and the generalized turning criterion), I don’t think this is relevant for most users; as these are less efficient variants, not something that you would ever want to go back to.

I agree the slice sampler outpeforms mutonomial one pratically.

For the generalised U-turn criterion, I believe it is just a numerical approximation of the original U-turn criterion, and they should be mathematically the same on Euclidean space; but the Riemannian HMC does need the generalised one.

Also I think static HMC samplers are quite useful.

For AdvancedHMC.jl, the API to for log-density is passed through two functions:

  1. One for log density which takes in parameter and returns the corresponding log density;
  2. One for gradient which takes in parameter and returns a tuple of log density and the corresponding gradient. What method to compute the gradient is your choice: using some AD library or with a hand-written gradient.

I believe the example in the README.md of our repo is clear enough regarding this.

One thing I noticed is that DynamicHMC has more and heavier dependencies, it pulls in 37 packages (directly and indirectly) vs. 23 packages for AdvancedHMC. Package reflect this (about 4.0 s for DynamicHMC vs. 2.0 s for AdvancedHMC).

This is not meant to be a criticism, I assume there’s probably very good reasons for DynamicHMC’s heavier footprint. But it’s something to consider when deciding whether to integrate a backend unconditionally via Project.toml or only optionally via Requires.jl.

It’s the other way round: the new multinomial sampler with the forward bias is better than the slice sampler.

My understanding that the generalized criterion is the more fundamental one, and the original NUTS paper criterion approximates it (and yes, it came before, theory in this area usually follows practice). BTW, there was a nice discussion about extensions on the Stan forum recently:

I think I disagree here: except for some simple examples, varying curvature means that static HMC always over- or undershoots,

Yes, that’s basically Optim which pulls in everything but the kitchen sink. I will think about streamlining this, but generally finding a local optimum helps quite a bit for initial adaptation, and it is one of the nice native Julia packages.

So if I understand correctly, the APIs we currently have for log-density sampling problems are:

  • AdvancedHMC: There isn’t really an API or trait, the user code passes a function for the density, and another function for density plus gradient. The dimensionality of the problem is determined from θ_init (is that correct?).

  • DynamicHMC, via LogDensityProblems: There’s trait-based API with functions dimension, logdensity andlogdensity_and_gradient`. A density can be any type.

  • BAT (the package I’m involved with): Type-based API, densities must be subtypes of BAT.AbstractDensity and implement functions nparams and density_logval. So very similar to LogDensityProblems, except for the abstract type. BAT is currently still Metropolis-only, so there’s no function for density-plus-gradient yet, but I always intended for it to be something ending with a !, to allow passing the gradient into a preallocated array - but since all current AD frameworks are allocating arrays anyhow, maybe that’s a moot point. GC cost has gone down too, in recent Julia versions.

    Originally, my API was much heavier, with functions to negotiate use of threads and a density_logval! that calculates multiple density values at once (e.g. for nested sampling / ensembles). But partr and custom broadcasting have made that obsolete.

    I do have an additional function param_bounds in my API, though - parameter bounds are important for gradient-free samplers like MH, but I also want to use the bounds information to choose a good transformation for HMC automatically. nparams is inferred from the parameter bounds, by default.

I wonder if the time may have come for a more-or-less official Julia density-problems API, defined by a lightweight package that lives in JuliaStats organization (or some other central place)? What do you guys think?