A simple MCMC code for beginners for learning

If you try to understand how Metropolis-Hastings MCMC works in producing a posterior for a parameter. You can read various website like this

But unless you understand R very well, it would make your head spin. But you can understand it easily with this simple julia code. You would need of course to understand some simple Bayesian theory beforehand.

using Distributions
using StatsPlots

# We perform 9 coin tosses and obtain 6 heads and 3 tails.
# What is the probability density function for obtaining a Head?
heads = 6
tails = 3

# define the number of points used (Not strictly required for MCMC)
numofpoints = 1001

# define probability grid from 0.0 to 1.0 with 1001 points
# (Not strictly required for MCMC)
p_grid = [ k  for k in range(0,stop=1,length=numofpoints) ]
p_grid_lb = p_grid[1]      # lower bound ( required for MCMC )
p_grid_ub = p_grid[end]    # upper bound ( required for MCMC )

# define our prior which is a uniform prior ( required for MCMC )
prior = [ 1.0  for _ in 1:numofpoints ]

# define the binomial distribution
function dbinom(n::Integer,size::Integer,p::Real)
    binomial(size, n) * p^n * (1.0-p)^(size-n)
end

# define the number of samples used in MCMC
# And create array p with that many elements (filled with 0.0)
n_samples = 1_000_000
p = zeros(n_samples)
# Set the starting value to a random number between 0.0 and 1.0
# strictly speaking it should be between p_grid_lb and p_grid_ub
p[1] = rand()
JumpingWidth = 0.1
for i in 2:n_samples
    # p_new has a value around the vicinity of p[i-1]
    p_new = rand(  Normal( p[i-1] , JumpingWidth )  )
    # make sure p_new is between 0.0 and 1.0
    if p_new < p_grid_lb      # if p_new < 0.0
        p_new = p_grid_lb + abs( p_new - p_grid_lb )   # abs( p_new )
    elseif p_new > p_grid_ub  # if p_new > 1.0
        p_new = p_grid_ub - abs( p_new - p_grid_ub )   # 2.0 - p_new
    end
    # Calc the probability of obtaining Heads and Tails
    # assuming that probability = p[i-1] and probability = p_new
    " q0 is posterior0 = likelihood0 * prior0        "
    " q1 is posterior1 = likelihood1 * prior1        "
    " In our cases, the prior is always 1.0          "
    " or calc prior as prior[1+ceil(p*(numofpoints-1))] "
    " or calc prior as Distributions.pdf(Distributions.Beta(1, 1),p) "
    " where p is the probability "
    " and the likelihood is dbinom( heads , heads+tails , probability ) "
    q0 = dbinom( heads , heads+tails , p[i-1] ) * 1.0
    q1 = dbinom( heads , heads+tails , p_new  ) * 1.0
    # The value of p[i] depends on whether the
    # random number is less than q1/q0
    p[i] = rand() < q1/q0 ? p_new : p[i-1]
end

# Next we need to display the histogram of the array p
# this is the shape of the posterior produced by MCMC
histogram(p,legend=false) |> display

13 Likes

Exactly, I have used R for many years but still find the Julia code in StatisticalRethinking.jl more straightforward to understand

1 Like

Here is the equivalent code for Turing

# Import libraries.
using Turing
using Distributions, StatsPlots

# 9 coin tosses with 6 heads and 3 tails
data = [1, 0, 1, 0, 1, 1, 0, 1, 1]

# Now we use Turing to find the unknown parameter p
# Declare our Turing model.
@model coinflip(data) = begin
    # Our prior belief about the unknown parameter p which is
    # the probability of heads in a coin.
    p ~ Beta(1, 1)

    # The number of observations.
    N = length(data)
    for n in 1:N
        # Heads or tails of a coin are drawn using
        # a process which have a Bernoulli distribution
        data[n] ~ Bernoulli(p)
    end
end;

# Settings of the Hamiltonian Monte Carlo (HMC) sampler.
# If you never used HMC before then do not change the value
# of ϵ and τ from 0.05 and 10
samples = 1_000
ϵ = 0.05 # leapfrog step size
τ = 10   # leapfrog step numbers

# Start sampling.
chain = sample(coinflip(data), HMC(ϵ, τ), samples);

# Summarise results
println(  describe(chain)  )

# Plot a summary of the sampling process for the parameter p, i.e. the probability of heads in a coin.
display(  histogram(chain[:p])  )

6 Likes

And now may I present the source code for MCMC for estimating TWO parameters!!!

using Distributions
using StatsPlots

# Here are the weights of the students (in kilograms) in a classrom
# Assuming the weight of the students are generated by a biological
# process with a normal distribution with mean μ and standard dev σ
# find the posterior distribution of those two parameters
data = [ 55.8, 64.1, 61.2, 55.2, 58.0, 69.8, 53.2, 70.0, 63.3, 63.5 ]

# define our prior for μ as a normal centered on 60 ( required for MCMC )
prior_μ_distr = Distributions.Normal(60, 10)
prior_μ(p) = Distributions.pdf(prior_μ_distr,p)
μ_lb = 30
μ_ub = 90

# define our prior for σ as a exponential with mean of 20 ( required for MCMC )
prior_σ_distr = Distributions.Exponential(20)
prior_σ(p) = Distributions.pdf(prior_σ_distr,p)
σ_lb = 0
σ_ub = 50

# define the number of samples used in MCMC
# And create array p with that many elements (filled with 0.0)
n_samples = 1_000_000
p_μ = zeros(n_samples)
p_σ = zeros(n_samples)
# Set the starting value to a random number between lb and ub
p_μ[1] = rand(prior_μ_distr)
while p_μ[1] < μ_lb || p_μ[1] > μ_ub
    p_μ[1] = rand(prior_μ_distr)
end

p_σ[1] = rand(prior_σ_distr)
while p_σ[1] < σ_lb || p_σ[1] > σ_ub
    p_σ[1] = rand(prior_σ_distr)
end

# The likelihood function which returns the probability,
# given a fixed mu and a fixed sigma, of the sequence of datapoints
function likelihood(data::Vector,mu::Real,sigma::Real)
    numofdatapoints = length(data)
    result=1.0
    for k = 1:numofdatapoints
        result *= pdf(Normal(mu, sigma),data[k])
    end
    return result
end

JumpingWidth = 0.1
for i in 2:n_samples
    # p_new has a value around the vicinity of p[i-1]
    p_μ_new = rand(  Normal( p_μ[i-1] , JumpingWidth*(μ_ub-μ_lb) )  )
    # make sure p_μ_new is between μ_lb and μ_ub
    if p_μ_new < μ_lb
        p_μ_new = μ_lb + abs( p_μ_new - μ_lb )
    elseif p_μ_new > μ_ub
        p_μ_new = μ_ub - abs( p_μ_new - μ_ub )
    end

    p_σ_new = rand(  Normal( p_σ[i-1] , JumpingWidth*(σ_ub-σ_lb) )  )
    # make sure p_σ_new is between σ_lb  and σ_ub
    if p_σ_new < σ_lb
        p_σ_new = σ_lb + abs( p_σ_new - σ_lb )
    elseif p_σ_new > σ_ub
        p_σ_new = σ_ub - abs( p_σ_new - σ_ub )
    end
    " q0 is posterior0 = likelihood0 * prior0        "
    " q1 is posterior1 = likelihood1 * prior1        "
    priorμ0 = prior_μ(p_μ[i-1])
    priorσ0 = prior_σ(p_σ[i-1])
    priorμ1 = prior_μ(p_μ_new)
    priorσ1 = prior_σ(p_σ_new)

    q0 = likelihood(data, p_μ[i-1], p_σ[i-1]) * priorμ0 * priorσ0
    q1 = likelihood(data, p_μ_new, p_σ_new) * priorμ1 * priorσ1
    # The value of p[i] depends on whether the
    # random number is less than q1/q0
    if rand() < q1/q0
        p_μ[i] = p_μ_new
        p_σ[i] = p_σ_new
    else
        p_μ[i] = p_μ[i-1]
        p_σ[i] = p_σ[i-1]
    end


end

# Next we need to display the histogram of the array p
# this is the shape of the posterior produced by MCMC
histogram(p_μ,legend=false,title="Posterior of mu") |> display
histogram(p_σ,legend=false,title="Posterior of sigma") |> display

2 Likes

Because we are doing lots and lots and lots of multiplication, even Float64 can run out of room. Therefore most MCMC programs uses Logarithm and this ensures that Float64 will not hit its limitation.

Source code below uses Logarithm

using Distributions
using StatsPlots

# Here are the weights of the students (in kilograms) in a classrom
# Assuming the weight of the students are generated by a biological
# process with a normal distribution with mean μ and standard dev σ
# find the posterior distribution of those two parameters
data = [ 55.8, 64.1, 61.2, 55.2, 58.0, 69.8, 53.2, 70.0, 63.3, 63.5 ]

# define our prior for μ as a normal centered on 60 ( required for MCMC )
prior_μ_distr = Distributions.Normal(60, 10)
prior_μ(p) = Distributions.pdf(prior_μ_distr,p)
μ_lb = 30
μ_ub = 90

# define our prior for σ as a exponential with mean of 20 ( required for MCMC )
prior_σ_distr = Distributions.Exponential(20)
prior_σ(p) = Distributions.pdf(prior_σ_distr,p)
σ_lb = 0
σ_ub = 50

# define the number of samples used in MCMC
# And create array p with that many elements (filled with 0.0)
n_samples = 1_000_000
p_μ = zeros(n_samples)
p_σ = zeros(n_samples)
# Set the starting value to a random number between lb and ub
p_μ[1] = rand(prior_μ_distr)
while p_μ[1] < μ_lb || p_μ[1] > μ_ub
    p_μ[1] = rand(prior_μ_distr)
end

p_σ[1] = rand(prior_σ_distr)
while p_σ[1] < σ_lb || p_σ[1] > σ_ub
    p_σ[1] = rand(prior_σ_distr)
end

log10(x) = log(10,x)

# The likelihood function which returns the probability,
# given a fixed mu and a fixed sigma, of the sequence of datapoints
function loglikelihood(data::Vector,mu::Real,sigma::Real)
    numofdatapoints = length(data)
    result=0.0
    for k = 1:numofdatapoints
        result += log10(  pdf(Normal(mu, sigma),data[k])  )
    end
    return result
end

JumpingWidth = 0.1
for i in 2:n_samples
    # p_new has a value around the vicinity of p[i-1]
    p_μ_new = rand(  Normal( p_μ[i-1] , JumpingWidth*(μ_ub-μ_lb) )  )
    # make sure p_μ_new is between μ_lb and μ_ub
    if p_μ_new < μ_lb
        p_μ_new = μ_lb + abs( p_μ_new - μ_lb )
    elseif p_μ_new > μ_ub
        p_μ_new = μ_ub - abs( p_μ_new - μ_ub )
    end

    p_σ_new = rand(  Normal( p_σ[i-1] , JumpingWidth*(σ_ub-σ_lb) )  )
    # make sure p_σ_new is between σ_lb  and σ_ub
    if p_σ_new < σ_lb
        p_σ_new = σ_lb + abs( p_σ_new - σ_lb )
    elseif p_σ_new > σ_ub
        p_σ_new = σ_ub - abs( p_σ_new - σ_ub )
    end
    " q0 is posterior0 = likelihood0 * prior0        "
    " q1 is posterior1 = likelihood1 * prior1        "
    logpriorμ0 = log10(  prior_μ(p_μ[i-1])  )
    logpriorσ0 = log10(  prior_σ(p_σ[i-1])  )
    logpriorμ1 = log10(  prior_μ(p_μ_new)   )
    logpriorσ1 = log10(  prior_σ(p_σ_new)   )

    q0 = loglikelihood(data, p_μ[i-1], p_σ[i-1]) + logpriorμ0 + logpriorσ0
    q1 = loglikelihood(data, p_μ_new, p_σ_new) + logpriorμ1 + logpriorσ1
    # The value of p[i] depends on whether the
    # random number is less than q1/q0
    if log10(rand()) < q1-q0
        p_μ[i] = p_μ_new
        p_σ[i] = p_σ_new
    else
        p_μ[i] = p_μ[i-1]
        p_σ[i] = p_σ[i-1]
    end


end

# Next we need to display the histogram of the array p
# this is the shape of the posterior produced by MCMC
histogram(p_μ,legend=false,title="Posterior of mu") |> display
histogram(p_σ,legend=false,title="Posterior of sigma") |> display

2 Likes

Here is the source code for estimating 3 parameters. I am not writing any more source code because writing it is tiring and tedious. I am only doing it as a proof of concept and to help others figure out how the magic of MCMC is performed behind the scenes.

In real life, please use Turing or STAN

using Distributions
using StatsPlots

# Here are the heights and weights of babies (in metres and kilograms)
# Assuming the weight of the babies are generated by a biological
# process with a normal distribution with mean μ and standard dev σ
# Weight ~ Normal(μ,σ)
#          μ = α + β Height
# find the posterior distribution of those three parameters α , β , σ
# data[rownum,1] is Height (in metres)
# data[rownum,2] is Weight/Mass (in kilograms)
data = [
0.46 2.5;
0.51 3.4;
0.54 4.4;
0.57 5.1;
0.60 5.6;
0.62 6.1;
0.64 6.4;
0.65 6.7;
0.66 7.0;
0.68 7.2]

# rough estimate of α and β
# julia>  [ 1 0.6; 1 0.62 ] \ [5.6; 6.1]
# 2-element Array{Float64,1}:
#  -9.399999999999986
#  24.99999999999998

# define our prior for α as a uniform centered on -9.4 ( required for MCMC )
prior_α_distr = Distributions.Uniform(-20, 0)
prior_α(p) = Distributions.pdf(prior_α_distr,p)
α_lb = -20
α_ub = 0

# define our prior for β as a normal centered on 25 ( required for MCMC )
prior_β_distr = Distributions.Normal(25, 20)
prior_β(p) = Distributions.pdf(prior_β_distr,p)
β_lb = -35
β_ub = 65

# define our prior for σ as a exponential with mean of 2 ( required for MCMC )
prior_σ_distr = Distributions.Exponential(2)
prior_σ(p) = Distributions.pdf(prior_σ_distr,p)
σ_lb = 0
σ_ub = 10

# define the number of samples used in MCMC
# And create array p with that many elements (filled with 0.0)
n_samples = 1_000_000
p_α = zeros(n_samples)
p_β = zeros(n_samples)
p_σ = zeros(n_samples)
# Set the starting value to a random number between lb and ub
p_α[1] = rand(prior_α_distr)
while p_α[1] < α_lb || p_α[1] > α_ub
    p_α[1] = rand(prior_α_distr)
end

p_β[1] = rand(prior_β_distr)
while p_β[1] < β_lb || p_β[1] > β_ub
    p_β[1] = rand(prior_β_distr)
end

p_σ[1] = rand(prior_σ_distr)
while p_σ[1] < σ_lb || p_σ[1] > σ_ub
    p_σ[1] = rand(prior_σ_distr)
end

log10(x) = log(10,x)

# The likelihood function which returns the probability,
# given a fixed mu and a fixed sigma, of the sequence of datapoints
function loglikelihood(data::Matrix,alpha::Real,beta::Real,sigma::Real)
    numofdatapoints = size(data,1)
    result=0.0
    for k = 1:numofdatapoints
        # μ = α + β * Height
        mu = alpha + beta * data[k,1]
        result += log10(  pdf(Normal(mu, sigma),data[k,2])  )
    end
    return result
end

JumpingWidth = 0.1
for i in 2:n_samples
    # p_new has a value around the vicinity of p[i-1]
    p_α_new = rand(  Normal( p_α[i-1] , JumpingWidth*(α_ub-α_lb) )  )
    # make sure p_α_new is between α_lb and α_ub
    if p_α_new < α_lb
        p_α_new = α_lb + abs( p_α_new - α_lb )
    elseif p_α_new > α_ub
        p_α_new = α_ub - abs( p_α_new - α_ub )
    end

    # p_new has a value around the vicinity of p[i-1]
    p_β_new = rand(  Normal( p_β[i-1] , JumpingWidth*(β_ub-β_lb) )  )
    # make sure p_β_new is between β_lb and β_ub
    if p_β_new < β_lb
        p_β_new = β_lb + abs( p_β_new - β_lb )
    elseif p_β_new > β_ub
        p_β_new = β_ub - abs( p_β_new - β_ub )
    end

    p_σ_new = rand(  Normal( p_σ[i-1] , JumpingWidth*(σ_ub-σ_lb) )  )
    # make sure p_σ_new is between σ_lb  and σ_ub
    if p_σ_new < σ_lb
        p_σ_new = σ_lb + abs( p_σ_new - σ_lb )
    elseif p_σ_new > σ_ub
        p_σ_new = σ_ub - abs( p_σ_new - σ_ub )
    end
    " q0 is posterior0 = likelihood0 * prior0        "
    " q1 is posterior1 = likelihood1 * prior1        "
    logpriorα0 = log10(  prior_α(p_α[i-1])  )
    logpriorβ0 = log10(  prior_β(p_β[i-1])  )
    logpriorσ0 = log10(  prior_σ(p_σ[i-1])  )
    logpriorα1 = log10(  prior_α(p_α_new)   )
    logpriorβ1 = log10(  prior_β(p_β_new)   )
    logpriorσ1 = log10(  prior_σ(p_σ_new)   )

    q0 = loglikelihood(data, p_α[i-1], p_β[i-1], p_σ[i-1]) + logpriorα0 + logpriorβ0 + logpriorσ0
    q1 = loglikelihood(data, p_α_new, p_β_new, p_σ_new) + logpriorα1 + logpriorβ1 + logpriorσ1
    # The value of p[i] depends on whether the
    # random number is less than q1/q0
    if log10(rand()) < q1-q0
        p_α[i] = p_α_new
        p_β[i] = p_β_new
        p_σ[i] = p_σ_new
    else
        p_α[i] = p_α[i-1]
        p_β[i] = p_β[i-1]
        p_σ[i] = p_σ[i-1]
    end


end

# Next we need to display the histogram of the array p
# this is the shape of the posterior produced by MCMC
histogram(p_α,legend=false,title="Posterior of alpha") |> display
histogram(p_β,legend=false,title="Posterior of beta") |> display
histogram(p_σ,legend=false,title="Posterior of sigma") |> display

1 Like

Thanks for putting these examples together. In order to achieve better numerical stability, I recommend using logpdf(...) instead of log10(pdf(...)). The formulas for logpdf(...) are usually designed to be more numerically stable than log(pdf(...)).

2 Likes

This is last post. There are two more innovations

  1. standardise the rawdata by mean and standard deviation of the rawdata
  2. use a burned in sampling of 1000 samples
using Distributions, Statistics
using StatsPlots

# Here are the heights and weights of babies (in metres and kilograms)
# Assuming the weight of the babies are generated by a biological
# process with a normal distribution with mean μ and standard dev σ
# (Weight - weight_mean)/rel_weight_stddev ~ Normal(μ,σ)
#          μ = α + β (Height - Height_mean)/rel_height_stddev
#
# Weight = weight_mean + rel_weight_stddev * Normal(μ,σ)
#      μ = α + β (Height - Height_mean)/rel_height_stddev
# find the posterior distribution of those three parameters α , β , σ
rawdata = [
0.46 2.5;
0.51 3.4;
0.54 4.4;
0.57 5.1;
0.60 5.6;
0.62 6.1;
0.64 6.4;
0.65 6.7;
0.66 7.0;
0.68 7.2]

height_mean = mean(rawdata[:,1])
weight_mean = mean(rawdata[:,2])
relativedata = rawdata .- [height_mean weight_mean]
rel_height_stddev = std(relativedata[:,1])
rel_weight_stddev = std(relativedata[:,2])
data = relativedata ./ [rel_height_stddev rel_weight_stddev]

# data[rownum,1] is Height (in Z score)
# data[rownum,2] is Weight/Mass (in Z score)
# julia> data
# 10×2 Array{Float64,2}:
#  -1.85612    -1.85925
#  -1.15833    -1.29009
#  -0.739655   -0.657695
#  -0.320982   -0.215016
#   0.0976903   0.101184
#   0.376805    0.417384
#   0.65592     0.607103
#   0.795478    0.796823
#   0.935035    0.986543
#   1.21415     1.11302

# rough estimate of α and β
#julia> [ 1 0.0976; 1 0.376] \ [ 0.10; 0.417]
#2-element Array{Float64,1}:
# -0.011132183908045976
#  1.1386494252873562

# define our prior for α as a uniform centered on 0.0 ( required for MCMC )
prior_α_distr = Distributions.Uniform(-10, 10)
prior_α(p) = Distributions.pdf(prior_α_distr,p)
α_lb = -10
α_ub = 10

# define our prior for β as a normal centered on 1 ( required for MCMC )
prior_β_distr = Distributions.Normal(1, 1)
prior_β(p) = Distributions.pdf(prior_β_distr,p)
β_lb = -2
β_ub = 4

# define our prior for σ as a exponential with mean of 2 ( required for MCMC )
prior_σ_distr = Distributions.Exponential(1)
prior_σ(p) = Distributions.pdf(prior_σ_distr,p)
σ_lb = 0
σ_ub = 5

# define the number of samples used in MCMC
# And create array p with that many elements (filled with 0.0)
burnedinsamples = 1000
n_samples = 1_000_000 + burnedinsamples
p_α = zeros(n_samples)
p_β = zeros(n_samples)
p_σ = zeros(n_samples)
# Set the starting value to a random number between lb and ub
p_α[1] = rand(prior_α_distr)
while p_α[1] < α_lb || p_α[1] > α_ub
    p_α[1] = rand(prior_α_distr)
end

p_β[1] = rand(prior_β_distr)
while p_β[1] < β_lb || p_β[1] > β_ub
    p_β[1] = rand(prior_β_distr)
end

p_σ[1] = rand(prior_σ_distr)
while p_σ[1] < σ_lb || p_σ[1] > σ_ub
    p_σ[1] = rand(prior_σ_distr)
end

log10(x) = log(10,x)

# The likelihood function which returns the probability,
# given a fixed mu and a fixed sigma, of the sequence of datapoints
function loglikelihood(data::Matrix,alpha::Real,beta::Real,sigma::Real)
    numofdatapoints = size(data,1)
    result=0.0
    for k = 1:numofdatapoints
        # μ = α + β * Height
        mu = alpha + beta * data[k,1]
        result += log10(  pdf(Normal(mu, sigma),data[k,2])  )
    end
    return result
end

JumpingWidth = 0.1
for i in 2:n_samples
    # p_new has a value around the vicinity of p[i-1]
    p_α_new = rand(  Normal( p_α[i-1] , JumpingWidth*(α_ub-α_lb) )  )
    # make sure p_α_new is between α_lb and α_ub
    if p_α_new < α_lb
        p_α_new = α_lb + abs( p_α_new - α_lb )
    elseif p_α_new > α_ub
        p_α_new = α_ub - abs( p_α_new - α_ub )
    end

    # p_new has a value around the vicinity of p[i-1]
    p_β_new = rand(  Normal( p_β[i-1] , JumpingWidth*(β_ub-β_lb) )  )
    # make sure p_β_new is between β_lb and β_ub
    if p_β_new < β_lb
        p_β_new = β_lb + abs( p_β_new - β_lb )
    elseif p_β_new > β_ub
        p_β_new = β_ub - abs( p_β_new - β_ub )
    end

    p_σ_new = rand(  Normal( p_σ[i-1] , JumpingWidth*(σ_ub-σ_lb) )  )
    # make sure p_σ_new is between σ_lb  and σ_ub
    if p_σ_new < σ_lb
        p_σ_new = σ_lb + abs( p_σ_new - σ_lb )
    elseif p_σ_new > σ_ub
        p_σ_new = σ_ub - abs( p_σ_new - σ_ub )
    end
    " q0 is posterior0 = likelihood0 * prior0        "
    " q1 is posterior1 = likelihood1 * prior1        "
    logpriorα0 = log10(  prior_α(p_α[i-1])  )
    logpriorβ0 = log10(  prior_β(p_β[i-1])  )
    logpriorσ0 = log10(  prior_σ(p_σ[i-1])  )
    logpriorα1 = log10(  prior_α(p_α_new)   )
    logpriorβ1 = log10(  prior_β(p_β_new)   )
    logpriorσ1 = log10(  prior_σ(p_σ_new)   )

    q0 = loglikelihood(data, p_α[i-1], p_β[i-1], p_σ[i-1]) + logpriorα0 + logpriorβ0 + logpriorσ0
    q1 = loglikelihood(data, p_α_new, p_β_new, p_σ_new) + logpriorα1 + logpriorβ1 + logpriorσ1
    # The value of p[i] depends on whether the
    # random number is less than q1/q0
    if log10(rand()) < q1-q0
        p_α[i] = p_α_new
        p_β[i] = p_β_new
        p_σ[i] = p_σ_new
    else
        p_α[i] = p_α[i-1]
        p_β[i] = p_β[i-1]
        p_σ[i] = p_σ[i-1]
    end


end

# Finally we must not forget to remove the burned in samples
deleteat!(p_α,1:burnedinsamples)
deleteat!(p_β,1:burnedinsamples)
deleteat!(p_σ,1:burnedinsamples)

# Next we need to display the histogram of the array p
# this is the shape of the posterior produced by MCMC
histogram(p_α,legend=false,title="Posterior of alpha") |> display
histogram(p_β,legend=false,title="Posterior of beta") |> display
histogram(p_σ,legend=false,title="Posterior of sigma") |> display

1 Like

I would strongly suggest making functions for each section of the code. This should be both easier to digest and run much faster!

6 Likes

dpsanders,

Thank you for your input. I may rework the code and put them in functions for functional style programming. As these code are for beginners to understand. Faster performance is not an objective, rather ability for beginners to read and understand the code is the number one objective

In my opinion, having the code in digestible, small chunks inside functions with well chosen names makes any code much more readable and understandable.

3 Likes

In Soss, you would do

julia> using Soss

julia> data = [1, 0, 1, 0, 1, 1, 0, 1, 1];

julia> coinflip = @model N begin
           p ~ Beta(1,1)
           data ~ Bernoulli(p) |> iid(N)
       end;

Then (if you didn’t have data) you could generate some with

julia> rand(coinflip(N=10))
(N = 10, p = 0.5585410994609438, data = Bool[0, 0, 0, 1, 1, 1, 0, 1, 1, 0])

Now for sampling

julia> post = dynamicHMC(coinflip(N=10), (data=data,)) |> particles
(p = 0.649 ± 0.13,)
6 Likes

I have simplify the source code and place it inside a module called SimpleMCMC.jl
I also also implemented Gibbs_MCMC in addition to MetropolisHastings_MCMC
Hopefully this source code would allow beginners and newbies to figure out how basic MCMC works behind the scenes.

FILE: SimpleMCMC.jl

module SimpleMCMC
    export MetropolisHastings_MCMC
    export Gibbs_MCMC
    export describe_paramvec

    using Distributions, StatsPlots, Statistics, Printf

    # Version 6
    #  v6: Add Gibbs_MCMC and change module name from MHMCMC to SimpleMCMC
    #  v5: turn it into a module
    #  v4: remove prior from Parameter struct and the struct itself
    #  v3: create JumpingWidthVec, lbVec, ubVec  to improve efficiency
    #  v2: Remove q0 from the main loop to improve efficiency
    #  v1: Turn it into a version that uses pure function(s)

    function MetropolisHastings_MCMC(data,lop::Vector{Distribution},loglikelihood;samples,burnedinsamples,JumpingWidth=0.01)
        # Find the number of parameters
        numofparam = length(lop)
        # calc the n_samples
        n_samples = samples + burnedinsamples
        # Create a parameter Matrix p with n_samples rows and numofparam cols
        p = zeros(n_samples,numofparam)
        # JumpingWidthVector, lowerboundVector and upperboundbVector
        JumpingWidthVec = zeros(numofparam)
        lbVec = zeros(numofparam)
        ubVec = zeros(numofparam)
        # Create a Vector of our logprior function
        logpriorVec = Array{Function}(undef,numofparam)

        # Set the starting value to a random number between lb and ub
        for k = 1:numofparam  # for each parameter k
            # Set the starting value to a random number between lb and ub
            p[1,k] = rand(lop[k])

            # get 100 random values and find the practical max and practical min
            HundredRandValues = rand(lop[k],100)
            prac_min = minimum(HundredRandValues)
            prac_max = maximum(HundredRandValues)
            JumpingWidthVec[k] = JumpingWidth * (prac_max - prac_min)

            lbVec[k] = minimum(lop[k])
            ubVec[k] = maximum(lop[k])

            # Array of function prior(x) = pdf(distribution,x)
            logpriorVec[k] = x->logpdf(lop[k],x)
        end

        p_old = vec(p[1,:])
        q0 = loglikelihood(data,p_old...) + sum([logpriorVec[k](p_old[k]) for k = 1:numofparam ])

        # prepare the p_new array
        p_new = zeros(numofparam)
        # Main loop for MetropolisHastings MCMC
        for i = 2:n_samples
            # p_new has a value around the vicinity of p[i-1]
            for k = 1:numofparam  # for each parameter k
                # p_new has a value around the vicinity of p[i-1]
                p_new[k] = rand(  Normal( p[i-1,k] , JumpingWidthVec[k] )  )
                # make sure p_new is between lb and ub
                if p_new[k] < lbVec[k]
                    p_new[k] = lbVec[k] + abs( p_new[k] - lbVec[k] )
                elseif p_new[k] > ubVec[k]
                    p_new[k] = ubVec[k] - abs( p_new[k] - ubVec[k] )
                end
            end

            # Calc the two posterior
            " q0 is posterior0 = likelihood0 * prior0        "
            " q1 is posterior1 = likelihood1 * prior1        "
            q1 = loglikelihood(data,p_new...) + sum([logpriorVec[k](p_new[k]) for k = 1:numofparam ])
            # The value of p[i] depends on whether the
            # random number is less than q1/q0
            p[i,:] .= log(rand()) < q1-q0 ? (q0=q1; p_old=p_new[:]) : p_old
        end
        # Finally we must not forget to remove the burned in samples
        return p[(burnedinsamples+1):end,:]
    end

    function Gibbs_MCMC(data,lop::Vector{Distribution},loglikelihood;samples,burnedinsamples)
        # Find the number of parameters
        numofparam = length(lop)
        # calc the n_samples
        n_samples = samples + burnedinsamples
        # Create a parameter Matrix p with n_samples rows and numofparam cols
        p = zeros(n_samples,numofparam)
        # Create a Vector of our logprior function
        logpriorVec = Array{Function}(undef,numofparam)
        # Number of holes to drill
        numofdrillholes = 1001
        drillhole = Array{Float64}(undef,numofdrillholes)
        invcdf = Array{Float64}(undef,numofdrillholes+1)
        # prepare the vectors: startposvec stopposvec chunksizevec
        startposvec = zeros(numofparam)
        stopposvec = zeros(numofparam)
        chunksizevec = zeros(numofparam)

        # Set the starting value to a random number between lb and ub
        for k = 1:numofparam  # for each parameter k
            # Set the starting value to a random number between lb and ub
            p[1,k] = rand(lop[k])

            # get 400 random values and find the practical max and practical min
            FourHundredRandValues = rand(lop[k],400)
            startposvec[k] = minimum(FourHundredRandValues)
            stopposvec[k]  = maximum(FourHundredRandValues)
            chunksizevec[k] = (stopposvec[k]-startposvec[k])/(numofdrillholes-1)

            # Array of function prior(x) = pdf(distribution,x)
            logpriorVec[k] = x->logpdf(lop[k],x)
        end

        # Main loop for Gibbs MCMC
        for i = 2:n_samples
            # prepare the p_new vector
            p_new = vec(p[i-1,:])
            # Select a new value for each parameter of p_new
            for k = 1:numofparam  # for each parameter k
                # Calculate chunk size
                startpos,stoppos,chunksize = startposvec[k],stopposvec[k],chunksizevec[k]
                invcdf[1] = 0.0
                # Start drilling the drill holes for parameter k
                for n = 1:numofdrillholes
                    p_new[k] = startpos + (n-1) * chunksize
                    drillhole[n] = exp(  loglikelihood(data,p_new...)  )
                    invcdf[n+1] = invcdf[n] + drillhole[n]
                end
                invcdf /= invcdf[end]
                # Now we perform an inverse CDF sampling
                x = 0.0
                while x == 0.0
                    x = rand()  # Make sure x is not zero
                end
                counter = 2
                while !(invcdf[counter-1] < x <= invcdf[counter])
                    counter += 1
                end
                counter -= 1
                # now find the kth parameter value
                pos = startpos + (counter-1) * chunksize
                # This is the value we are going to jump to
                p_new[k] = pos
                p[i,k] = pos
            end
        end
        # Finally we must not forget to remove the burned in samples
        return p[(burnedinsamples+1):end,:]
    end

    function describe_paramvec(name,namestr,v::Vector)
        m,s,v_sorted = mean(v),std(v),sort(v)
        pc5 = v_sorted[Int64(round(0.055*length(v_sorted)))]
        pc94 = v_sorted[Int64(round(0.945*length(v_sorted)))]
        println(
            "Parameter $(name)   mean ",@sprintf("%6.3f",round(m,sigdigits=3)),
            "   sd ",@sprintf("%6.3f",round(s,sigdigits=3)),
            "   5.5% ",@sprintf("%6.3f",round(pc5,sigdigits=3)),
            "   94.5% ",@sprintf("%6.3f",round(pc94,sigdigits=3))
        )
        histogram(v,legend=false,title="Posterior of $(namestr)") |> display
        return (m,s,pc5,pc94)
    end

end

Here is an example code on how to use it

using Distributions, StatsPlots, Statistics
using SimpleMCMC

function MCMC_example()

    # Here are the weights of the students (in kilograms) in a classrom
    # Assuming the weight of the students are generated by a biological
    # process with a normal distribution with mean μ and standard dev σ
    # find the posterior distribution of those two parameters
    data = [ 55.8, 64.1, 61.2, 55.2, 58.0, 69.8, 53.2, 70.0, 63.3, 63.5 ]

    # define our prior for μ as a normal centered on 60 ( required for MCMC )
    μ = Normal(60, 10)
    # define our prior for σ as a exponential with mean of 20 ( required for MCMC )
    σ = Exponential(20)

    # Put all our parameters into a list
    ListOfParams = Distribution[μ,σ]

    # The loglikelihood function which returns the log(probability),
    # given a fixed mu and a fixed sigma, of the sequence of datapoints
    function loglikelihood(data::Vector,mu,sigma)
        numofdatapoints = length(data)
        result=0.0
        for k = 1:numofdatapoints
            result += logpdf(Normal(mu,sigma),data[k])
        end
        return result
    end

    p = MetropolisHastings_MCMC(
            data,
            ListOfParams,
            loglikelihood,
            samples=1_000_000,
            burnedinsamples=5000
    )

    # Next we need to display the histogram of the column p[:,1]
    # this is the shape of the posterior produced by MCMC
    vec_μ=vec(p[:,1])  # Convert a Matrix column into a 1-D Vector
    vec_σ=vec(p[:,2])  # Convert a Matrix column into a 1-D Vector
    describe_paramvec("μ","mu",vec_μ)
    describe_paramvec("σ","sigma",vec_σ)
    histogram2d(vec_μ,vec_σ,nbins=100,xlabel="mu",ylabel="sigma")
end

# We call the main function to kick off the example
MCMC_example()

Here is the text output

Starting Julia...
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _  |  |
  | | |_| | | | (_| |  |  Version 1.3.0 (2019-11-26)
 _/ |\__ _|_|_|\__ _|  |  Official https://julialang.org/ release
|__/                   |

[ Info: Precompiling SimpleMCMC [top-level]
Parameter μ   mean 61.300   sd  2.130   5.5% 57.900   94.5% 64.700
Parameter σ   mean  6.680   sd  1.850   5.5%  4.440   94.5%  9.960

Here are the three graphs produced by the program



6 Likes

Thank you very much for writing this detailed example of the MCMC implementation, it helped a lot to understand it better.

Would it be possible to write the same example (the one of the weights of the students) in Turing and post it here? That would help a lot for beginners (like me) to understand how Turing works and compare with what was shown here.

For another source of teaching oriented code, the BayesExample1(2,3).jl files in Econometrics/Examples/Bayesian at main · mcreel/Econometrics · GitHub may be useful. The first example computes the posterior directly, the second uses basic MH MCMC, and the third use Turing and NUTS, all for the same one parameter model.

3 Likes