Sampling from power posterior

theogf · February 9, 2021, 1:09pm

Hey,
I was wondering if someone could guide me (or direct me to the relevant code) on how to sample from the power posterior given a Turing model. By power posterior, I mean the posterior to modified joint : p(x|y; b) \propto p(y|x)^b p(x), where b \in [0,1].

cscherrer · February 9, 2021, 1:42pm

I don’t know about Turing, but it should be relatively straightforward to build a logdensity function like this for Soss.

I like the way the power posterior connects the prior and posterior, but I don’t know any applications for sampling from it. Is this something that comes up a lot in some area?

theogf · February 9, 2021, 2:10pm

It appears in particular for Thermodynamic Integration for which I created a small package : GitHub - theogf/ThermodynamicIntegration.jl: Thermodynamic Integration for Turing models and more.
I wanted to create some nice wrapper where the user gives a model and it automatically spits out the evidence. I would be happy to create a wrapper for Soss models as well!

cscherrer · February 9, 2021, 2:18pm

Nice! In the log-density this just comes in as a multiplicative constant, so I don’t think there will be any measurable overhead (the rest of the computation is so much more). So I may just add a switch for this to logdensity.

What’s a good descriptive name for the exponent b as a keyword argument?

theogf · February 9, 2021, 2:21pm

In physics they call it the coupling parameter cf : Thermodynamic integration - Wikipedia

cscherrer · February 9, 2021, 2:35pm

I’m seeing it called the temperature in some places:

Is that consistent with other uses of “temperature”?

cscherrer · February 9, 2021, 2:38pm

Also

theogf · February 9, 2021, 2:39pm

I think the issue with temperature it that it could be confused with tempered posteriors, which are quite different: for : p(x|y) \propto \exp(-V(x,y)),
the tempered posterior is given by p(x|y;T)\propto \exp(-\frac{V(x,y)}{T}) where T is the temperature

mohamed82008 · February 9, 2021, 2:39pm

The MiniBatchContext in DynamicPPL and Turing can help you with that. The loglike_scalar in DynamicPPL.jl/contexts.jl at master · TuringLang/DynamicPPL.jl · GitHub is b.

theogf · February 9, 2021, 2:42pm

Oh that does sound perfect, it obviously has a completely different target but it should work!

cscherrer · February 9, 2021, 2:44pm

The “Probabilistic Integration” paper calls it the inverse temperature. Both b and 1/T appear as an exponent, so maybe the same?

trappmartin · February 9, 2021, 2:48pm

Yeah, there are various names for the same, e.g. power posterior or cold posterior. I have seen the term power posterior mostly been used by statisticians.

theogf · February 9, 2021, 2:48pm

Fair enough, I would personally disagree with the use of the term (a temperature is not restricted to [0,1] ) and there is still my argument of how confusing it can be with tempered posterior. But I think as long everything is properly described all terms are fine.

theogf · February 9, 2021, 2:49pm

But these two are different For power posterior you only act on the likelihood, while for cold posteriors you act on the whole joint.
Cold posteriors are used in simulated annealing for instance.

trappmartin · February 9, 2021, 2:51pm

Hm, I think there is a paper on BNNs which defines a cold posterior just as a power posterior.

trappmartin · February 9, 2021, 2:53pm

But maybe I remember it wrong. It’s a while ago that I look at this paper.

theogf · February 9, 2021, 2:55pm

If you mean this one : How Good is the Bayes Posterior in Deep Neural Networks Really? | Florian Wenzel they use the same definition that I gave you 2021-02-09_15-54

I got pretty confused at first as well, that’s why I am so confident now

Cold posteriors are tempered posteriors where T<1

trappmartin · February 9, 2021, 2:59pm

Ah, yes. See I remembered it wrong.
But at least I remembered correctly that it was the same as some existing concept. Haha.

theogf · February 10, 2021, 3:42pm

So I tried the following :

function power_logjoint(model, β)
    ctx = DynamicPPL.MiniBatchContext(DynamicPPL.DefaultContext(), β)
    spl = DynamicPPL.SampleFromPrior()
    return function f(z)
        vi = DynamicPPL.VarInfo(model)
        varinfo = DynamicPPL.VarInfo(vi, ctx, z)
        model(varinfo, spl, ctx)
        return DynamicPPL.getlogp(varinfo)
    end
end

But I get the error

ERROR: LoadError: MethodError: no method matching getspace(::DynamicPPL.MiniBatchContext{DynamicPPL.DefaultContext,Float64})
Closest candidates are:
  getspace(::Union{DynamicPPL.SampleFromPrior, DynamicPPL.SampleFromUniform}) at /home/theo/.julia/packages/DynamicPPL/wf0dU/src/sampler.jl:9
  getspace(::GibbsConditional{S,C} where C) where S at /home/theo/.julia/packages/Turing/a9ANC/src/inference/gibbs_conditional.jl:60
  getspace(::SMC{space,R} where R) where space at /home/theo/.julia/packages/Turing/a9ANC/src/inference/Inference.jl:425
  ...
Stacktrace:
 [1] DynamicPPL.VarInfo(::DynamicPPL.VarInfo{NamedTuple{(:x,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:x,Tuple{}},Int64},Array{MvNormal{Float64,PDMats.PDiagMat{Float64,Array{Float64,1}},FillArrays.Zeros{Float64,1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:x,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}, ::DynamicPPL.MiniBatchContext{DynamicPPL.DefaultContext,Float64}, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{ThermodynamicIntegration.var"#f#33"{DynamicPPL.Model{var"#7#8",(:y,),(),(),Tuple{Array{Float64,1}},Tuple{}},DynamicPPL.MiniBatchContext{DynamicPPL.DefaultContext,Float64},DynamicPPL.SampleFromPrior},Float64},Float64,5},1}) at /home/theo/.julia/packages/DynamicPPL/wf0dU/src/varinfo.jl:115

mohamed82008 · February 10, 2021, 3:55pm

Full stacktrace please.

Topic		Replies	Views
Custom likelihoods in Turing.jl General Usage	15	3783	October 26, 2018
Custom Likelihood Distribution/increment likelihood Probabilistic Programming	7	1098	August 7, 2020
Model updating: Using previously computed posterior as prior after new data acquisition Probabilistic Programming question	3	615	September 18, 2020
Sampling from posterior predictive distribution Probabilistic Programming	3	1322	December 27, 2019
Using a posterior from a previous sample as a prior Probabilistic Programming turing	2	832	April 25, 2021

Sampling from power posterior

Related topics