I was wondering whether to use rand() or StatsBase.sample() to name an MCMC sampling function? Some packages (e.g. AdvancedHMC.jl) use sample(), I currently use rand() (BAT.jl). There was a decision on rand() vs sample() years ago:
@simonbyrne:
After some discussion, the decision was made to leave it as is. rand and sample are indeed subtly different:
rand samples independently (either uniform [0,1], uniform from a discrete set, or from a specific distribution if used with Distributions),
sample allows for non-independent methods (e.g. without replacement), which are only applicable to finite sets.
MCMC typically draws from a Distribution/Density, and without replacement, so that would mean rand(). However, the samples are also not really independent, due to the way MCMC works, so that would mean sample().
I was considering to switch from rand() to sample(), but I’d value some input on this from people working on similar issues.
The key thing about MCMC is that draws a Markov chain, not a single element.
I don’t think it is a good idea to use StatsBase.sample for an MCMC method. It serves a very specific purpose, which has no overlap with MCMC, really.
Obviously. However, both sample() and rand() are commonly uses in variants that draw many samples at once. In addition, after MCMC burn-in, one might actually draw MCMC samples in a one-by-one fashion, at least in principle (yes, they’re correlated and yes, one actually needs to still draw enough for convergence).
I don’t think it is a good idea to use StatsBase.sample for an MCMC method. It serves a very specific purpose, which has no overlap with MCMC, really.
Yes, that’s why I’m not so happy with sample(), myself.
It is, however, better if packages that do semantically similar things provide methods for common functions. After all, that’s kind of the basis of multiple dispatch resulting in high composability. We do have quite a few MCMC samplers in the ecosystem now, I think it would be helpful if they wouldn’t all use different function names.
Generally yes, but MCMC methods are so diverse that I am not sure that this can be done in a way that helps much.
Also, I think the major simplification would come from being able to reuse the same problem definition in multiple samplers. Whether the latter are called with a single function name is kind of a detail relative to this.
Generally yes, but MCMC methods are so diverse that I am not sure that this can be done in a way that helps much.
Hm, certainly not easy, but you can make a similar argument for optimization algorithms, and a common Julia API has emerged there.
the major simplification would come from being able to reuse the same problem definition in multiple samplers
Definitely. I really thing a package AbstractDensities or so would be helpful. I do think it there should be common abstracts types though, not just traits, since dispatch would be important. Just my take on it.
Not quite what you are asking for but MCMCChains.jl was meant as a package serving general abstractions. The generic sample interface develop by @cpfiffer is likely relevant to what you think about and will at some in the near future from the Turing base to MCMCChains.
MCMCChains.jl was meant as a package serving general abstractions.
Yes, I did look at MCMCChains.jl - unfortunately, it’s not quite abstract enough for what I needed, it’s quite opionionated about certain things. For example, it’s central struct Chains assumes that all chains generate exactly the same number of samples, and doesn’t allow for samples to have a weight.
generic sample interface develop by @cpfiffer is likely relevant to what you think about
That extends StatsBase.sample, right? So that would answer my original question about what function to use, from Turing’s point of view.
Hm, the Chains type supports missing values to support dynamic models. So you can probably just dump results with different lengths into it by filling it up with missing values. But nothing is carved in stone. Just open a feature request or a PR and it’s done.
It’s true, MCMCChains is a little opinionated, but that’s really a function of the fact that people who have different use cases go elsewhere to make their own packages or otherwise do not communicate their needs. I’m happy to make MCMCChains better for everyone. MCMCChains is currently mostly for the Turing-verse, but we have an ultimate goal to make it a lot friendlier to everyone’s needs.
The point of MCMCChains is to implement a set of shared tooling for everyone to use, and if it’s not adapted to your use case, I’d appreciate an issue or feature request.
Turing’s main focus is doing just this. We want to make all the samplers and modelling syntax hot-swappable, and to make it so that developers have a common language for inference design and use. We’ve talked a little with @cscherrer about running his very intuitive modelling syntax for Soss on top of Turing, so you can just use the samplers and not our modelling system. MCMCChains is (soon) going to be a much bigger focus for this, because we’ll be putting our very abstract sampling interface into MCMCChains.
If you have something you want implemented in some kind of interface, please open an issue somewhere! We want to make Turing/MCMCChains/Libtask/Bijectors/AdvancedHMC something that everyone can feel comfortable using without feeling like it doesn’t meet their needs.
I’d second that vote – sample is very intuitive to me. It’s easy to use. You sample from a Markov chain, you don’t rand it. You should also take my opinion here with a grain of salt because I think people tend to get a little too focused on what a word means across packages and in different contexts. Perhaps I am lazy or a bad programmer, but you should name your function whatever makes sense to you. As long as you tell someone what it does.
I’ve struggled with this naming decision as well, and arrived at some implicit criteria for when to use it. Probably good to make those more explicit. Something like:
rand results should be (reasonably) independent
rand should be assumed to be fast.
You should only be able to rand things that you already think of as a distribution
So for example, I’m using rand for sampling a generative model:
julia> m
@model X begin
β ~ Normal() |> iid(size(X, 2))
y ~ For(eachrow(X)) do x
Normal(x' * β, 1)
end
end
julia> rand(m(X=randn(4,3)))
(X = [-0.24584286110990192 1.3630154650500355 1.0848259998787169
; 0.5509591856320882 1.1425945459696365 -0.6656767860486106
; 0.12576304858078619 0.252801522573155 -0.03197051342847274
; 0.5410519564930669 2.2650098662118183 -0.26346460097826263],
β = [-0.6676654013852129, -0.15539545120571607, -1.729568522460887],
y = [-3.318779673899307, 0.9473075938230047, -0.5045955647304496, -1.1995408059637898])
But sampling the posterior requires MCMC so there’s dependence. And it’s much slower. So that case is definitely sample, not rand.
Thanks for reaching out, @cpfiffer! I wasn’t aware of this, I had assumed this was Turing-internal.
If this is meant to become something shared (and I very much like that idea), maybe at some point it would make sense to move it into the JuliaStats GitHub org, or so? That would make it more “central”.
So MCMCChains is still open for major/breaking changes? In that case, I’ll be happy to open feature requests and I’d very much like to bring things together. For example, we have some convergence checks in BAT.jl (I think they may predate Turing), that are also in MCMCChains now (and MCMCChains offers several more that I may want to use).
Currently, my design for storing chain-output is quite different, though (https://github.com/bat/BAT.jl/blob/master/src/parameters/density_sample.jl#L55). As for storing multiple chains, I currently just append them and distinguish using their ID. ArraysOfArrays.VectorOfVectors can then be used to turn that into a vector of (chain output) vectors in an O(1) operation.
We plan to get a v1.0 of BAT.jl out within the next 6 weeks or so, though (kinda have to), so I can’t rip things up too much on my side right now. But I’ll definitely take a look at that sampler interface to see how compatible it is with our current design.
I’m not sure, @yebai is more the strategic leader on that front. I think for the moment we’d prefer to keep it under the Turing umbrella so that we can stay in a more rapid development lane, but that may be a misconception on my part.
Yep. I’m not opposed to breaking changes as long as we put in a little effort to make sure the stakeholders we have are able to keep up, whether by us opening PR’s on their codebases or whatever. @goedman might have some thoughts on the matter, as he was very involved with the last round of breaking changes MCMCChains went through.
In general I think breaking changes are fine as long as there’s good versioning support and open communication. If those breaking changes make MCMCChains.jl everyone’s home for storing MCMC samplers then I’ll be much happier.
@cpfiffer is correct - we’re happy to consider moving some Turing packages to more generic umbrellas in the future. But at the moment, keeping them in Turing enables faster development.
As for BAT.jl, we have a fairly robust, vanilla Metropolis-Hastings sampler - nothing fancy, but with good auto-tuning. Has recently handled about 300 parameters, which is probably not too shabby for plain MH, even with a fairly benign posterior space. I’ve actually been thinking about factoring that out into a separate package at some point, I’ll have a look in how that could integrate with the Turing/MCMCChain sampler interface. On the other hand, we want to add support for AdvancedHMC and/or DynamicHMC to BAT.
BAT’s target use case is, I think, orthogonal to Turing’s: It’s intended for users who bring their own likelihood (can involve calling external libraries, sometimes even black-box standalone programs, which is the MH sampler has always been our mainstay). BAT is supposed to provide a user friendly environment for Bayesian inference and model comparison for such use cases (we also have an algorithm called AHMI, that will calculate an integral/evidence purely based on posterior samples). Plus plotting recipes, etc. BAT.jl’s predecessor (C++) has been used quite a bit in particle and low-background physics experiments.
Under the hood, I think we could share quite a bit, and I’m all for exploring the possibilities there. One thing is that our samples always have weights (due to frequent sample repetitions with MH, also we have an alternative weighting scheme that keeps rejected samples) - so currently, the data format of chain output is very different from MCMCChains.jl, but maybe we can find some common ground, resp. make it flexible enough to cover these kinds of chains as well.
BAT is also intended for large-scale parallel evaluation (still working on that), so there’s things like automatic partitioning of RNG space (each iteration of each chain get’s an independent RNG, all from the same seed, so evaluation can be distributed).
i was wondering if it’s HEP related from the beginning , may I know what is the concrete use case here? Events generation? detector simulation or data analysis (think of HiggsCombine maybe)
i was wondering if it’s HEP related from the beginning
Not HEP specific in any way, but AFAIK the C++ BAT is used by most dark-matter analyses at LHC, and also for ATLAS analysis in some groups. It’s also been used a lot in neutrinoless double-beta decay experiments and direct-detection dark matter experiments and for quite a bit of other stuff.
BAT.jl is our designated successor of C++ BAT, though it remains to be seen if it will enjoy the same success - we chose Julia very deliberately (for reasons probably obvious to all of you), but past use cases we rooted (or should I say ROOTed ) firmly in the HEP C++ world. A lot of HEP people use Python too now, but the kinds of likelihood that are needed are usually hard to write in Python in a
performant fashion. We’ll have to see if people are willing to give Julia a try - we’ll try to make it as easy as possible, and also start doing some hands-on training very soon. Which is why we need to release BAT.jl v1.0 with a stable interface real soon (but that should not prevent us from integrating better with Turing & friends, and there can always be a BAT.jl v2.0).
get out!! Jokes aside, the BAT work of you people looks nice, I will check out the slides on indico later for sure.
I hope so much HEP could use more Julia, atm people are all going down the road of pyroot -> uproot(numpy, awkard-array), but recently I heard he’s re-writing a Cpp-backend awkward-array… it’s all over again , but on the other hand, bunch of legacy Fortran, C code are just fine, and there doesn’t seem to be much demand for modern technology (e.g, differentiable programming doesn’t make MC easier to make)