SampleChains

Also I think it’s very useful to be able to solve problems like “I want a estimate of \mathbb{E}[f(X)] with a maximum standard error of \varepsilon.” If samples are expensive enough, we’d want to track the standard error and stop sampling when we’re close enough.

If a chain knows how to get more samples and we have, say, an OnlineStat for the standard error, this seems straightforward.

I’m also just realizing AbstractMCMC probably won’t handle weighted observations, which we’ll need for VI. I think it’s best for AbstractMCMC to only manage the sampling logic, produce an iterator, allow this to be called as needed, and pass one sample at a time to whatever package is managing the chains.

That state tells you where the sampler is?

Again this is just just a difference in opinion on which thing does what. You want your chain to be more than a data structure and to have knowledge of the outside world – how samples work, how to get them, etc. Our philiosophy has been that it is just a chain. It holds the samples and maybe does some analysis. It should do that well and have no idea how to get more samples. It seems seems really bizarre to me to add that kind of logic to it.

That is just not true – it is exceptionally general. You can do absolutely whatever you want within the framework of AbstractMCMC. If you want to weight, build that into your inference algorithm.

Ok, I thought by “state” you meant the state for an iterator. It sounds like you mean something a lot more general, like almost everything you’d have in an iterator, but you should have to jump through some more hoops to get to an actual iterator. I’m just having a lot of trouble seeing any advantage to this.

If it holds a state that can tell you where to find a sampler, that’s a lot more than you’re describing here.

Again, it doesn’t have that logic. It gets it from other places, and just holds it for the convenience of the end user.

I understand this is a difference of opinion. But there are strong advantages to the approach I’m suggesting. I hope I’ve made them clear; please let me know if I haven’t. It would be really helpful to hear any disadvantages, so we can get beyond our current opinions. I’m willing to rethink this if need be, I just haven’t yet heard a good argument against it.

Thanks, that’s really interesting. I’ll start taking a closer look and try to get a better feel for how things can connect.

I do think it’s critical to keep things very simple for end-users. So getting samples and adding some more should be no harder than this:

julia> using SampleChainsDynamicHMC, TransformVariables

julia> function ℓ(nt)
               z = nt.x/nt.σ
               return -z^2 - nt.σ - log(nt.σ)
       end
ℓ (generic function with 1 method)

julia> t = as((x=asℝ, σ=asℝ₊))
TransformVariables.TransformTuple{NamedTuple{(:x, :σ), Tuple{TransformVariables.Identity, TransformVariables.ShiftedExp{true, Float64}}}}((x = asℝ, σ = asℝ₊), 2)

julia> chain = initialize!(DynamicHMCChain, ℓ, t)
1-element Chain with schema (x = Float64, σ = Float64)
(x = -0.17±0.0, σ = 0.43±0.0)

julia> drawsamples!(chain, 9)
10-element Chain with schema (x = Float64, σ = Float64)
(x = 0.05±0.24, σ = 0.512±0.3)

julia> drawsamples!(chain, 90)
100-element Chain with schema (x = Float64, σ = Float64)
(x = -0.01±0.32, σ = 0.588±0.37)

Or with multiple chains:

julia> chains = initialize!(4, DynamicHMCChain, ℓ, t)
4-element MultiChain with 4 chains and schema (x = Float64, σ = Float64)
(x = 0.02±0.4, σ = 0.83±0.6)

julia> drawsamples!(chains, 9)
40-element MultiChain with 4 chains and schema (x = Float64, σ = Float64)
(x = -0.04±0.5, σ = 0.82±0.54)

julia> drawsamples!(chains, 90)
400-element MultiChain with 4 chains and schema (x = Float64, σ = Float64)
(x = -0.14±0.62, σ = 0.92±0.75)

Perhaps I understand what you’re doing finally. We have exactly the same logic as you:

chain = sample(model, sampler, 1000)

chain = resume(chain, 1000)

But in this case chain just has an empty bucket to store the model and the state. The sampler and the model decide what to put in the empty bucket. It’s equivalent to

chain = resume(model, sampler, chain, 1000)

None of this seems any more complicated for the end-user than what you are doing. More importantly, none of this is precluded by AbstractMCMC or anything – the only thing that happens here is that the model and the sampler take what they need from the chain in order to resume, rather than storing an iterator directly.

Ok, yes I think that’s the same. It’s very confusing to call chain an empty bucket, or to think of it as only holding the samples. The “state” in your case is equivalent to an “iterator” as I’ve been talking about it. They’re exactly the same, otherwise chain = resume(chain, 1000) would be impossible.

Does resume(chain, 1000) allocate new memory and copy everything over?

This is now registered:

My goals here are

  • Make this very easy to use from DynamicHMC directly
  • Have similarly easy-to-use implementations for other back-ends
  • Connect to PPLs like Soss with very little extra code
  • Work toward a connection with AbstractMCMC.jl

I’m still thinking through that last point. The biggest benefit seems to be some nice-looking code for running chains in parallel. Ideally, a one-time effort to interface to AbstractMCMC would mean getting samplers “for free” down the road. I’m skeptical it would really work out this way, but I hope I might be wrong about that.

Anyway, I don’t think this will happen without some help. AbstractMCMC seems designed with MCMCChains.jl in mind. Even if it’s possible in principle to go outside this, the direct approach would seem to involve a lot of extra allocation and copying.

One very minor point: AbstractMCMC has chain_type very far down the list of arguments. In other Julia functions that take a ::Type argument that will become the return type, this is one of the first arguments. The more common approach makes it more prominent, and also makes it easy to dispatch as

mynewmethod(::Type{WhateverType}, args...) = someotherfunction(args...) 
2 Likes