Uncertainty quantification of inverse problem using hierarchical Bayes: high-level modelling

Oliver_Lylloff · September 12, 2019, 7:44am

Hi all,

I’m trying to solve an example problem from a paper [1] and wish to do it with high-level modelling. I’m looking into Turing.jl but other packages may exist that I do not know of. For a didactic purpose I want to keep the formulation as close to the paper as possible. From what I could gather, I need to implement this hierarchical model and sample from the posterior:

Using Turing.jl I have the following MWE:

using Turing, Distributions, LinearAlgebra
#using Plots
n = 40
h = 1/n
t = h/2:h:1-h/2

x_true = 50*(0.75.*((0.1 .< t) .& (t .< 0.25)) + 0.25*((0.3 .< t) .& (t.<0.32)) + ((0.5 .< t) .& (t .< 1.0)).*sin.(2*pi.*t).^4)
#plot(x_true)

# Construct convolution kernel matrix A:
sig = .03 # Kernel width.
kernel(i,j,h,sig) = (h/(sqrt(pi)*sig^2))*exp(-((i-j)h)^2/(2*sig^2))

A = Matrix{Float64}(undef,n,n)
for i in 1:n
    for j in 1:n
        A[i,j] = kernel(i,j,h,sig)
    end
end

Ax = A*x_true # noise-free observations
err_lev = 2
sigma = err_lev/100 * norm(Ax)/sqrt(n)
eta =  sigma * randn(n)
b = Ax + eta # add noise to observations

@model problem(x, A, b) = begin
    n = length(b)
    α_λ = 1
    β_λ = 1e-4
    α_δ = 1
    β_δ = 1e-4
    C = diagm(ones(n))
    λ ~ Gamma(n/2+α_λ,0.5*norm(A*x-b)^2+β_λ)
    δ ~ Gamma(n/2+α_δ,0.5*x'*C*x+β_δ)
    x ~ MvNormal(inv(λ*A'A+δ*C)*λ*A'b,Symmetric(inv(λ*A'A+δ*C)))
end

c = sample(problem(zeros(n), A, b), SMC(1000))

However, it seems I’m missing some pieces. I can get Turing to sample, but there’s no x parameter in the output, and also, using other samplers, HMC, for instance I get different error messages, so it seems I’m doing something wrong.

I know this problem can be solved faster/better in other ways (that’s what the paper is about) but want to see how close I can get to the mathematical formulation using high-level modelling.

Comments and suggestions will be appreciated,
thanks!

[1] J. M. Bardsley, “An Efficient MCMC Method for Uncertainty Quantification in Inverse Problems,” pp. 1–13, Nov. 2010. http://hs.umt.edu/math/research/technical-reports/documents/2010/28_BayesianUQforInverseProblems.pdf

trappmartin · September 12, 2019, 7:52am

In your model, x is observed and thus will not be added to the returned Chains object. Are you interested in infering x?

What are your observations?

– EDIT –

I see, b is observed. You need to implement the likelihood (see Eq. 6 in the paper). If necessary, you can implement a custom distribution for this. See: Advanced Usage
Note that you only need to implement the logpdf function as this is only the observation model, so no need to provide a rand function for it.

Then you need to pass b as you already do but do not pass x as it should be inferred using MCMC.

Oliver_Lylloff · September 12, 2019, 8:02am

OK, I see. Thanks! I was hoping that it would be enough to just sample from these 3 equations, but this is a little out of my comfort zone. Will dig into the custom distribution then

trappmartin · September 12, 2019, 8:04am

So far you only defined the priors, you also need to define the likelihood to be able to define the posterior and perform inference.

trappmartin · September 12, 2019, 8:10am

If this turns out to be too slow using HMC or NUTS, you can soon also use variational inference or stochastic variational inference using ADVI.

– EDIT –
You will need the master branch to do so. I think stochastic variational inference is not merged yet but should be available soon.

Oliver_Lylloff · September 12, 2019, 8:16am

Good to know, thanks. Will update when I have implemented logpdf.

Oliver_Lylloff · September 13, 2019, 9:49am

OK, a little progress here but still not quite there. I’ve been through the documentation and various issues on GitHub to better understand how the modelling approach works. Now I have a MyModel with a logpdf assigned. But in my @model I’m still confused about how to write the prior of x but also that I want to infer x, \lambda, and \delta from observations.

This is my attempt so far:

using Turing, Distributions, LinearAlgebra
import Distributions: pdf, logpdf
using Random
######### PROBLEM ##############

n = 40
h = 1/n
t = h/2:h:1-h/2

x_true = 50*(0.75.*((0.1 .< t) .& (t .< 0.25)) + 0.25*((0.3 .< t) .& (t.<0.32)) + ((0.5 .< t) .& (t .< 1.0)).*sin.(2*pi.*t).^4)
#plot(x_true)

# Construct convolution kernel matrix A:
sig = .03 # Kernel width.
kernel(i,j,h,sig) = (h/(sqrt(pi)*sig^2))*exp(-((i-j)h)^2/(2*sig^2))

A = Matrix{Float64}(undef,n,n)
for i in 1:n
    for j in 1:n
        A[i,j] = kernel(i,j,h,sig)
    end
end

Ax = A*x_true # noise-free observations
err_lev = 0
sigma = err_lev/100 * norm(Ax)/sqrt(n)
eta =  sigma * randn(n)
b = Ax + eta # add noise to observations

###############################
mutable struct MyModel{Tλ,Tδ,TA,Tb,TC} <: MultivariateDistribution{Continuous}
    λ::Tλ
    δ::Tδ
    A::TA
    b::Tb
    C::TC
end
Base.length(d::MyModel) = n
Distributions.rand(rng::AbstractRNG, d::MyModel) = rand(rng,n) # https://github.com/TuringLang/Turing.jl/issues/907

logpdf(d::MyModel,x::AbstractVector{<:Real}) = -0.5*d.λ*norm(d.A*x-d.b)^2
@model problem(b,A,C) = begin
    n = length(b)
    α_λ = 1
    β_λ = 1e-4
    α_δ = 1
    β_δ = 1e-4
    λ = 0.001 # Initializing for prior (fails for 0.0)
    δ = 0.001 # Initializing for prior (fails for 0.0)
    x ~ MvNormal(inv(λ*A'A+δ*C)*λ*A'b,Symmetric(inv(λ*A'A+δ*C)))
    λ ~ Gamma(n/2+α_λ,0.5*norm(A*x-b)^2+β_λ)
    δ ~ Gamma(n/2+α_δ,0.5*x'*C*x+β_δ)
    x ~ MyModel(λ,δ,A,b,C)
end

C = diagm(ones(n))
c = sample(problem(b,A,C), SMC(10000))

using Plots
plot(x_true)
scatter!(mean(c.value[1000:10000,3:42],dims=1).data[:])

Oliver_Lylloff · September 17, 2019, 2:32pm

Small update: Removing the prior on x ~ MvNormal(...) and switching to HMC gives a reasonable result, better than A\b:
HMC_recon

But I would still appreciate help to get the prior working.

Best,
Oliver

trappmartin · September 24, 2019, 12:49pm

What do you get if you use HMC and reverse_mode AD?

Oliver_Lylloff · September 24, 2019, 1:16pm

HMC(1000, 0.01, 10) is pretty good (both forward and reverse mode):

HMC_recon

NUTS(1000, 200, 0.65) is not really providing any useful inference.

But the main issue, I suspect, is that my model is not correctly implemented. This is what I used to get the result in the plot above:

mutable struct MyModel{Tλ,Tδ,TA,Tb,TC} <: MultivariateDistribution{Continuous}
    λ::Tλ
    δ::Tδ
    A::TA
    b::Tb
    C::TC
end
Base.length(d::MyModel) = n
Distributions.rand(rng::AbstractRNG, d::MyModel) = rand(rng,n) # https://github.com/TuringLang/Turing.jl/issues/907

logpdf(d::MyModel,x::AbstractVector{<:Real}) = -0.5*d.λ*norm(d.A*x-d.b)^2
@model problem(b,A,C) = begin
    n = length(b)
    α_λ = 1
    β_λ = 1e-4
    α_δ = 1
    β_δ = 1e-4
    λ = 0.001 # Initializing for prior (fails for 0.0)
    δ = 0.001 # Initializing for prior (fails for 0.0)
    x ~ MyModel(λ,δ,A,b,C)
    λ ~ Gamma(n/2+α_λ,0.5*norm(A*x-b)^2+β_λ)
    δ ~ Gamma(n/2+α_δ,0.5*x'*C*x+β_δ)
end

I don’t get how to set the prior on x: x ~ MvNormal(inv(λ*A'A+δ*C)*λ*A'b,Symmetric(inv(λ*A'A+δ*C))) in the model. Which is currently missing. Any ideas?

trappmartin · September 24, 2019, 2:58pm

Apologies for the late reply, I’m busy paper writing atm.

I think you are confusing what x is and what the observations are in your model.
According to the link you attached, b should be the observations and x should be inferred through posterior inference.

So the model should be something like this:

struct MyModel{Tλ,Tx,TA} <: MultivariateDistribution{Continuous}
    λ::Tλ
    x::Tx
    A::TA
end
Base.length(d::MyModel) = length(d.x) # Please check if this is correct
Distributions.rand(rng::AbstractRNG, d::MyModel) = rand(rng,n) # https://github.com/TuringLang/Turing.jl/issues/907

# Please check if this is correct, you might have to adjust the code to run properly.
Distributions.logpdf(d::MyModel,b::AbstractVector{<:Real}) = -0.5*d.λ*norm(d.A*d.x-b)^2

@model problem(b,A,C) = begin
    n = length(b)
    α_λ = 1
    β_λ = 1e-4
    α_δ = 1
    β_δ = 1e-4
    λ = 0.001 # Initializing for prior (fails for 0.0)
    δ = 0.001 # Initializing for prior (fails for 0.0)

    # prior over x
    x ~ MvNormal(inv(λ*A'A+δ*C)*λ*A'b,Symmetric(inv(λ*A'A+δ*C)))

    # prior over lambda
    λ ~ Gamma(n/2+α_λ,0.5*norm(A*x-b)^2+β_λ)

    # prior over delta
    δ ~ Gamma(n/2+α_δ,0.5*x'*C*x+β_δ)

    # likelihood (observations are drawn according to this distribution)
    # Eq. 6 in the paper.
    b ~ MyModel(λ,x,A)
end

I hope this helps.

Note that I have not tried the code, so you might have to correct minor errors I did but it should give you an idea what you need to do.

Cheers,
Martin

– EDIT –
I’m just realizing that each prior depends on each other. I think, this will be problematic if you use SMC or any other simulation based inference atm. HMC will somewhat rescue you in this particular situation as it provides values for x, lambda and delta at all times.

We are currently working on extracting a dynamic computation graph from the model definition. Once this is finished, it should be able to have a Gibbs sampler that can draw from this posterior.

Oliver_Lylloff · September 25, 2019, 11:39am

No need to apologise, I really appreciate your help! And yes, I am confused. Not so much about the mathematical problem but the translation to Turing.jl. With your example, it is now clear how that model would look like. The example runs fine, btw!

I’m still experimenting with the example - will post again when I have a solution.

Thanks!

trappmartin · September 25, 2019, 12:11pm

Turing follows the common notation for a generative process to define a model. The basic idea of that is to write down how to generate an observation. Maybe be it helps to think in those terms to understand how to write a model in Turing.

For mixture model this means:

@model gmm(x, K) = begin
   # generate the parameters of each component / cluster, i.e. the location.
   m ~ MvNormal(zeros(K), ones(K))

   # generate the probabilities (weights) of generating an observation of the kth cluster.
   w ~ Dirichlet(K, 1.0)

   # for each observation
   for n in length(x)
      # decide from which cluster the observation is generated
      z ~ Categorical(w)

      # generate the observation from the selected cluster.
      x[n] ~ Normal(m[z])
   end
end

In equations you would define the same model as follows:
m_k \sim Normal(0.1, 1.0) \, , \; k = 1, \dots, K
w \sim Dirichlet(1/K, \dots, 1/K)
z_n | w \sim w \, , \; \forall n
x_n | z, m \sim Normal(m_{z_n}, 1.0) \, , \; \forall n

So basically, what that means is that each observation is generated from a Normal distribution given z. If you perform inference, Turing will therefore assume that the likelihood model for the observations is a Normal distribution.

In case of your example, Eq. 6 is the assumed likelihood model for the observations. So to perform inference, all you need to do is to provide a custom distribution with a logpdf function that computs the likelihood for you. In case you have missing data or want to generate data from the model you would also need to provide a rand function.

I hope that helps. If not let me know what part is not clear.

Topic		Replies	Views
Bayesian logistic regression with Turing.jl Probabilistic Programming turing , monte-carlo	29	4499	May 18, 2021
Turing.jl slow inference Modelling & Simulations turing , modelingtoolkit	8	349	May 24, 2024
Turing.jl for geophysical applications Probabilistic Programming question , geophysics	5	730	March 29, 2024
Custom Models For Bayesian Inference Probabilistic Programming	4	394	December 5, 2023
Turing package(s) + parameter estimation for DiffEq models? New to Julia	38	1489	August 7, 2020

Uncertainty quantification of inverse problem using hierarchical Bayes: high-level modelling

Related topics