Propagate posterior samples from model calibration as the prior into an inverse problem

so I have a model y = f(x; \theta).
using (x, y) pairs, I get a posterior over theta. this constitutes “model calibration”.

now I wanna use this posterior as the prior for an inverse problem, where I observe y and wanna predict x. “yesterday’s posterior is today’s prior”

a way I’ve been doing this is to fit a multi-variate Gaussian to theta from my model calibration phase. then I have an analytical distribution to impose as my prior for the inverse problem.

but I don’t want to impose this parametric family on my prior; I wanna use my samples!

so of course one way to do this, is in the “@model” definition of the forward model for my inverse problem, simply sample a random row from my model calibration posterior’s data frame each time I wanna sample a theta from the prior for my inverse problem, then voila that works just fine. HOWEVER, then when I get my data frame of the posterior samples from my inverse problem, it does not contain the variables I sampled, so I lost ALL info about the correlation between the theta’s and my inferred x from the inverse problem! is there some way to create an empirical distribution as a prior in Turing.jl? or store these samples from rows of my data frame in the data frame that the MCMC sampler returns?

thx.

appendix

so I want to know correlation e.g. of c with h₀ but have no idea. I want to associate with each sample of h₀ the c value that produced it.

@model function infer_h₀_model(
	mdata::DataFrame, train_posterior::DataFrame, prior_only::Bool=false
)
	#=
	prior distributions
	yesterday's posterior is today's prior
	=#
	# important: sample ROW to capture correlations
	i = sample(1:nrow(train_posterior))

	H = train_posterior[i, "H"]
	α₀ = train_posterior[i, "α₀"]
	α₁ = train_posterior[i, "α₁"]
	α₂ = train_posterior[i, "α₂"]
	P_t = train_posterior[i, "P_t"]
	P_b = train_posterior[i, "P_b"]
	c = train_posterior[i, "c"]
	rₒ = train_posterior[i, "rₒ"]
	σₕ = train_posterior[i, "σₕ"]
	
	# initial liquid level
	h₀ ~ Uniform(0.0, H) # cm
	# initial time (shifts experiment time)
	t₀ ~ Normal(0.0, 10.0) # s

	# for prior, do not show the algo the data :)
	if prior_only
		return nothing
	end
	
	# set up, solve ODE
	sim_data, h_of_t = sim_h(
		h₀,
		P_t, P_b, H,
		c,
		rₒ
	)
	
	# for prior, do not show the algo the data :)
	if prior_only
		return nothing
	end

	# account for observations
	t = mdata[2, "t [s]"] # time for this data point
	ĥ = h_of_t(t - t₀)    # predicted liquid level
	δ = α₀ .+ α₁ * ĥ .+ α₂ * ĥ .^ 2 # discrepancy

	mdata[2, "h [cm]"] ~ Normal(ĥ + δ, σₕ)
	
	return nothing
end
1 Like

Hello! I think maybe what you might want is to replace a = expr with a := expr. When you sample from that model, the value of a will also be stored in the resulting chain.

cs = rand(10)

@model function track_extra_var()
    c := rand(cs)
    x ~ Normal(c)
end
chn = sample(track_extra_var(), MH(), 1000)
chn[:c]

(This should all be documented better, yeah.)

predict is somewhat related: API · DynamicPPL but it’s only a forward prediction, i.e. given a set of parameters you sampled already, it generates new predictions for observed variables (data) – it doesn’t let you get new samples for non-observed variables (parameters). So your approach of constructing a new model and sampling from that makes sense to me.

Personally, I think := is the most direct way of getting what you want. If you wanted to create a distribution from the samples, you could use (for example) a DiscreteUniform distribution (Univariate Distributions · Distributions.jl)

@model function sample_extra_var()
    i ~ DiscreteUniform(1, length(cs))
    c = cs[i]
    x ~ Normal(c)
end
chn = sample(sample_extra_var(), MH(), 1000)

This chain will contain i and x and so you can use it to extract the correlation between c and x. But then it’s kind of ugly because you get a discrete parameter, and that makes automatic differentiation very upset (NUTS will probably work with the first model and not the second). And the logpdf associated with i is a constant anyway so omitting it from the model shouldn’t affect the inference.

(Disclaimer: I’m a software engineer, not a statistician.)

2 Likes

excellent, thank you!

indeed,

i = rand(1:nrow(train_posterior))
H := train_posterior[i, "H"]
c := train_posterior[i, "c"]

is exactly what I was looking for.

ah, right. if I do this, now, the data I use for the likelihood function cannot possibly alter/update the distribution of these params. so I have to be sure that, physically, the data cannot possibly provide any information about these parameters that I am sampling from my model calibration’s posterior. for that reason, the DiscreteUniform might be better, since the data can then update this uniform prior over the model calibration’s posterior samples, to a posterior over this posterior. however, I do find this approach breaks when using NUTS, since the auto-differentiation doesn’t make sense in this discrete setting, I presume. that must be why you use MH(), no gradient needed. right.

before, when I was converting my model calibration’s posterior samples to a multi-variate Gaussian, it was kind of sad, since I was using the powerful NUTS sampler, but might as well have just used variational inference to get a multi-variate Gaussian in the first place!

Ah, yes, that makes sense. It’s mathematically equivalent, but perhaps a bit easier for the sampler if i is included as a parameter since it won’t then jump to a different value if the new sample was rejected. You could still use HMC for the remaining continuous parameters by doing something like this:

@model function sample_extra_var()
    i ~ DiscreteUniform(1, length(cs))
    c := cs[i]
    x ~ Normal(c)
end

sample(sample_extra_var(), Gibbs(:i => MH(), :x => HMC(0.1, 20)), 10000)

This seems to get better ess-per-second for x compared to just MH(), at least using the toy model from my previous comment.

And of course you can still track the value of c using := :slight_smile:

1 Like

pretty sick we can specify different MCMC algos for each variable!

WOW, on my 10D problem, it’s insane how much higher effective sample size NUTS gives than MH (by default, proposing according to the prior). MH is so ineffective compared to NUTS (but so much easier to explain in a paper or presentation).

for my inverse problem, based your comment, I figured out I could make the MH much more efficient by tuning the proposal dist’n for each variable. thanks!

	MH(
		:i => DiscreteUniform(1, nrow(train_posterior)),
		:h₀ => AdvancedMH.RandomWalkProposal(Normal(0, 1.0)),
		:t₀ => AdvancedMH.RandomWalkProposal(Normal(0, 0.1))
	)

thank you! will acknowledge you in my paper, “thank you penelopeysm for help with Turing.jl”, if ok with you.

1 Like

That’s very kind :smiling_face: you are welcome to use my full name if you prefer (penelopeysm (Penelope Yong) · GitHub) and do also share the paper once it’s out as we’re very keen to see how Turing is being used out there in the wild!