I am very recently introduced to Turing. In all the example code I see, the model is given a fixed set of observation, and sample a for specific number of iterations. I am wondering, is it possible to use a different subset of the input data for each iteration of sampling?
I think it’s important to separate the abstract sampling algorithm from the particulars of implementation, and how it’s called from a given PPL.
It sounds like you want to do something like a bootstrap, is that right? Those typically use maximum likelihood estimation. Or do you want an MCMC version of this? Do you have any specifics not particular to Turing?
Thank you for your quick response. I want to work on a Bayesian neural network using MCMC. Since the input data is quite large, I would like to split the input data into minibatches, and do few (maybe only 1) MCMC samplings using each minibatch. It would be trivial to implement a naive version of this manually, but I feel the performance can be significantly better if Turing can do this automatically.
Thank you for your suggestions on ZigZagBoomerang.jl, I would also look into it.
It’s only a bit tricky, so you need to know how many samples there are in total in advance and then you can estimate your likelihood unbiasedly in various fashions, including going throw the data by subsets in random order and repeating. Better ping @SebaGraz
What makes you think that? Typically hand-rolled performance is better, and we use PPLs more for convenience. The only exceptions I know of are in Soss.jl ^* where we can sometimes use symbolic simplifications to transform the log-density into a more efficient form.
^* The only exceptions in Julia, there are examples like Hakaru and Rainier in other languages.
Thanks, @mschauer. It might be worthed to add that the subsampling scheme for piecewise deterministic Monte Carlo methods (which are implemented in ZigZagBoomerang.jl) is efficient after preprocessing the data (something like Newton’s steps) for finding a mode of the posterior and centring the sampler there. So, somehow, even when doing subsampling, you must look at all your data once. P.S. rethinking of what I just said, you might go somewhere close to a posterior mode if you subsample the gradient in the Newton’s method. So maybe it is not strictly necessary to look at all your data.
If the ask is for a performant way of iterating over minibatches, why not use GitHub - lorenzoh/DataLoaders.jl: A parallel iterator for large machine learning datasets that don't fit into memory inspired by PyTorch's `DataLoader` class. https://lorenzoh.github.io/DataLoaders.jl/dev?