A simple MCMC code for beginners for learning

StevenSiew · November 29, 2019, 12:55am

If you try to understand how Metropolis-Hastings MCMC works in producing a posterior for a parameter. You can read various website like this

But unless you understand R very well, it would make your head spin. But you can understand it easily with this simple julia code. You would need of course to understand some simple Bayesian theory beforehand.

using Distributions
using StatsPlots

# We perform 9 coin tosses and obtain 6 heads and 3 tails.
# What is the probability density function for obtaining a Head?
heads = 6
tails = 3

# define the number of points used (Not strictly required for MCMC)
numofpoints = 1001

# define probability grid from 0.0 to 1.0 with 1001 points
# (Not strictly required for MCMC)
p_grid = [ k  for k in range(0,stop=1,length=numofpoints) ]
p_grid_lb = p_grid[1]      # lower bound ( required for MCMC )
p_grid_ub = p_grid[end]    # upper bound ( required for MCMC )

# define our prior which is a uniform prior ( required for MCMC )
prior = [ 1.0  for _ in 1:numofpoints ]

# define the binomial distribution
function dbinom(n::Integer,size::Integer,p::Real)
    binomial(size, n) * p^n * (1.0-p)^(size-n)
end

# define the number of samples used in MCMC
# And create array p with that many elements (filled with 0.0)
n_samples = 1_000_000
p = zeros(n_samples)
# Set the starting value to a random number between 0.0 and 1.0
# strictly speaking it should be between p_grid_lb and p_grid_ub
p[1] = rand()
JumpingWidth = 0.1
for i in 2:n_samples
    # p_new has a value around the vicinity of p[i-1]
    p_new = rand(  Normal( p[i-1] , JumpingWidth )  )
    # make sure p_new is between 0.0 and 1.0
    if p_new < p_grid_lb      # if p_new < 0.0
        p_new = p_grid_lb + abs( p_new - p_grid_lb )   # abs( p_new )
    elseif p_new > p_grid_ub  # if p_new > 1.0
        p_new = p_grid_ub - abs( p_new - p_grid_ub )   # 2.0 - p_new
    end
    # Calc the probability of obtaining Heads and Tails
    # assuming that probability = p[i-1] and probability = p_new
    " q0 is posterior0 = likelihood0 * prior0        "
    " q1 is posterior1 = likelihood1 * prior1        "
    " In our cases, the prior is always 1.0          "
    " or calc prior as prior[1+ceil(p*(numofpoints-1))] "
    " or calc prior as Distributions.pdf(Distributions.Beta(1, 1),p) "
    " where p is the probability "
    " and the likelihood is dbinom( heads , heads+tails , probability ) "
    q0 = dbinom( heads , heads+tails , p[i-1] ) * 1.0
    q1 = dbinom( heads , heads+tails , p_new  ) * 1.0
    # The value of p[i] depends on whether the
    # random number is less than q1/q0
    p[i] = rand() < q1/q0 ? p_new : p[i-1]
end

# Next we need to display the histogram of the array p
# this is the shape of the posterior produced by MCMC
histogram(p,legend=false) |> display

Yifan_Liu · November 29, 2019, 2:42am

Exactly, I have used R for many years but still find the Julia code in StatisticalRethinking.jl more straightforward to understand

StevenSiew · November 29, 2019, 3:38am

Here is the equivalent code for Turing

# Import libraries.
using Turing
using Distributions, StatsPlots

# 9 coin tosses with 6 heads and 3 tails
data = [1, 0, 1, 0, 1, 1, 0, 1, 1]

# Now we use Turing to find the unknown parameter p
# Declare our Turing model.
@model coinflip(data) = begin
    # Our prior belief about the unknown parameter p which is
    # the probability of heads in a coin.
    p ~ Beta(1, 1)

    # The number of observations.
    N = length(data)
    for n in 1:N
        # Heads or tails of a coin are drawn using
        # a process which have a Bernoulli distribution
        data[n] ~ Bernoulli(p)
    end
end;

# Settings of the Hamiltonian Monte Carlo (HMC) sampler.
# If you never used HMC before then do not change the value
# of ϵ and τ from 0.05 and 10
samples = 1_000
ϵ = 0.05 # leapfrog step size
τ = 10   # leapfrog step numbers

# Start sampling.
chain = sample(coinflip(data), HMC(ϵ, τ), samples);

# Summarise results
println(  describe(chain)  )

# Plot a summary of the sampling process for the parameter p, i.e. the probability of heads in a coin.
display(  histogram(chain[:p])  )

StevenSiew · November 29, 2019, 9:26am

And now may I present the source code for MCMC for estimating TWO parameters!!!

using Distributions
using StatsPlots

# Here are the weights of the students (in kilograms) in a classrom
# Assuming the weight of the students are generated by a biological
# process with a normal distribution with mean μ and standard dev σ
# find the posterior distribution of those two parameters
data = [ 55.8, 64.1, 61.2, 55.2, 58.0, 69.8, 53.2, 70.0, 63.3, 63.5 ]

# define our prior for μ as a normal centered on 60 ( required for MCMC )
prior_μ_distr = Distributions.Normal(60, 10)
prior_μ(p) = Distributions.pdf(prior_μ_distr,p)
μ_lb = 30
μ_ub = 90

# define our prior for σ as a exponential with mean of 20 ( required for MCMC )
prior_σ_distr = Distributions.Exponential(20)
prior_σ(p) = Distributions.pdf(prior_σ_distr,p)
σ_lb = 0
σ_ub = 50

# define the number of samples used in MCMC
# And create array p with that many elements (filled with 0.0)
n_samples = 1_000_000
p_μ = zeros(n_samples)
p_σ = zeros(n_samples)
# Set the starting value to a random number between lb and ub
p_μ[1] = rand(prior_μ_distr)
while p_μ[1] < μ_lb || p_μ[1] > μ_ub
    p_μ[1] = rand(prior_μ_distr)
end

p_σ[1] = rand(prior_σ_distr)
while p_σ[1] < σ_lb || p_σ[1] > σ_ub
    p_σ[1] = rand(prior_σ_distr)
end

# The likelihood function which returns the probability,
# given a fixed mu and a fixed sigma, of the sequence of datapoints
function likelihood(data::Vector,mu::Real,sigma::Real)
    numofdatapoints = length(data)
    result=1.0
    for k = 1:numofdatapoints
        result *= pdf(Normal(mu, sigma),data[k])
    end
    return result
end

JumpingWidth = 0.1
for i in 2:n_samples
    # p_new has a value around the vicinity of p[i-1]
    p_μ_new = rand(  Normal( p_μ[i-1] , JumpingWidth*(μ_ub-μ_lb) )  )
    # make sure p_μ_new is between μ_lb and μ_ub
    if p_μ_new < μ_lb
        p_μ_new = μ_lb + abs( p_μ_new - μ_lb )
    elseif p_μ_new > μ_ub
        p_μ_new = μ_ub - abs( p_μ_new - μ_ub )
    end

    p_σ_new = rand(  Normal( p_σ[i-1] , JumpingWidth*(σ_ub-σ_lb) )  )
    # make sure p_σ_new is between σ_lb  and σ_ub
    if p_σ_new < σ_lb
        p_σ_new = σ_lb + abs( p_σ_new - σ_lb )
    elseif p_σ_new > σ_ub
        p_σ_new = σ_ub - abs( p_σ_new - σ_ub )
    end
    " q0 is posterior0 = likelihood0 * prior0        "
    " q1 is posterior1 = likelihood1 * prior1        "
    priorμ0 = prior_μ(p_μ[i-1])
    priorσ0 = prior_σ(p_σ[i-1])
    priorμ1 = prior_μ(p_μ_new)
    priorσ1 = prior_σ(p_σ_new)

    q0 = likelihood(data, p_μ[i-1], p_σ[i-1]) * priorμ0 * priorσ0
    q1 = likelihood(data, p_μ_new, p_σ_new) * priorμ1 * priorσ1
    # The value of p[i] depends on whether the
    # random number is less than q1/q0
    if rand() < q1/q0
        p_μ[i] = p_μ_new
        p_σ[i] = p_σ_new
    else
        p_μ[i] = p_μ[i-1]
        p_σ[i] = p_σ[i-1]
    end


end

# Next we need to display the histogram of the array p
# this is the shape of the posterior produced by MCMC
histogram(p_μ,legend=false,title="Posterior of mu") |> display
histogram(p_σ,legend=false,title="Posterior of sigma") |> display

StevenSiew · November 29, 2019, 9:37am

Because we are doing lots and lots and lots of multiplication, even Float64 can run out of room. Therefore most MCMC programs uses Logarithm and this ensures that Float64 will not hit its limitation.

Source code below uses Logarithm

using Distributions
using StatsPlots

# Here are the weights of the students (in kilograms) in a classrom
# Assuming the weight of the students are generated by a biological
# process with a normal distribution with mean μ and standard dev σ
# find the posterior distribution of those two parameters
data = [ 55.8, 64.1, 61.2, 55.2, 58.0, 69.8, 53.2, 70.0, 63.3, 63.5 ]

# define our prior for μ as a normal centered on 60 ( required for MCMC )
prior_μ_distr = Distributions.Normal(60, 10)
prior_μ(p) = Distributions.pdf(prior_μ_distr,p)
μ_lb = 30
μ_ub = 90

# define our prior for σ as a exponential with mean of 20 ( required for MCMC )
prior_σ_distr = Distributions.Exponential(20)
prior_σ(p) = Distributions.pdf(prior_σ_distr,p)
σ_lb = 0
σ_ub = 50

# define the number of samples used in MCMC
# And create array p with that many elements (filled with 0.0)
n_samples = 1_000_000
p_μ = zeros(n_samples)
p_σ = zeros(n_samples)
# Set the starting value to a random number between lb and ub
p_μ[1] = rand(prior_μ_distr)
while p_μ[1] < μ_lb || p_μ[1] > μ_ub
    p_μ[1] = rand(prior_μ_distr)
end

p_σ[1] = rand(prior_σ_distr)
while p_σ[1] < σ_lb || p_σ[1] > σ_ub
    p_σ[1] = rand(prior_σ_distr)
end

log10(x) = log(10,x)

# The likelihood function which returns the probability,
# given a fixed mu and a fixed sigma, of the sequence of datapoints
function loglikelihood(data::Vector,mu::Real,sigma::Real)
    numofdatapoints = length(data)
    result=0.0
    for k = 1:numofdatapoints
        result += log10(  pdf(Normal(mu, sigma),data[k])  )
    end
    return result
end

JumpingWidth = 0.1
for i in 2:n_samples
    # p_new has a value around the vicinity of p[i-1]
    p_μ_new = rand(  Normal( p_μ[i-1] , JumpingWidth*(μ_ub-μ_lb) )  )
    # make sure p_μ_new is between μ_lb and μ_ub
    if p_μ_new < μ_lb
        p_μ_new = μ_lb + abs( p_μ_new - μ_lb )
    elseif p_μ_new > μ_ub
        p_μ_new = μ_ub - abs( p_μ_new - μ_ub )
    end

    p_σ_new = rand(  Normal( p_σ[i-1] , JumpingWidth*(σ_ub-σ_lb) )  )
    # make sure p_σ_new is between σ_lb  and σ_ub
    if p_σ_new < σ_lb
        p_σ_new = σ_lb + abs( p_σ_new - σ_lb )
    elseif p_σ_new > σ_ub
        p_σ_new = σ_ub - abs( p_σ_new - σ_ub )
    end
    " q0 is posterior0 = likelihood0 * prior0        "
    " q1 is posterior1 = likelihood1 * prior1        "
    logpriorμ0 = log10(  prior_μ(p_μ[i-1])  )
    logpriorσ0 = log10(  prior_σ(p_σ[i-1])  )
    logpriorμ1 = log10(  prior_μ(p_μ_new)   )
    logpriorσ1 = log10(  prior_σ(p_σ_new)   )

    q0 = loglikelihood(data, p_μ[i-1], p_σ[i-1]) + logpriorμ0 + logpriorσ0
    q1 = loglikelihood(data, p_μ_new, p_σ_new) + logpriorμ1 + logpriorσ1
    # The value of p[i] depends on whether the
    # random number is less than q1/q0
    if log10(rand()) < q1-q0
        p_μ[i] = p_μ_new
        p_σ[i] = p_σ_new
    else
        p_μ[i] = p_μ[i-1]
        p_σ[i] = p_σ[i-1]
    end


end

# Next we need to display the histogram of the array p
# this is the shape of the posterior produced by MCMC
histogram(p_μ,legend=false,title="Posterior of mu") |> display
histogram(p_σ,legend=false,title="Posterior of sigma") |> display

StevenSiew · November 30, 2019, 8:48am

Here is the source code for estimating 3 parameters. I am not writing any more source code because writing it is tiring and tedious. I am only doing it as a proof of concept and to help others figure out how the magic of MCMC is performed behind the scenes.

In real life, please use Turing or STAN

using Distributions
using StatsPlots

# Here are the heights and weights of babies (in metres and kilograms)
# Assuming the weight of the babies are generated by a biological
# process with a normal distribution with mean μ and standard dev σ
# Weight ~ Normal(μ,σ)
#          μ = α + β Height
# find the posterior distribution of those three parameters α , β , σ
# data[rownum,1] is Height (in metres)
# data[rownum,2] is Weight/Mass (in kilograms)
data = [
0.46 2.5;
0.51 3.4;
0.54 4.4;
0.57 5.1;
0.60 5.6;
0.62 6.1;
0.64 6.4;
0.65 6.7;
0.66 7.0;
0.68 7.2]

# rough estimate of α and β
# julia>  [ 1 0.6; 1 0.62 ] \ [5.6; 6.1]
# 2-element Array{Float64,1}:
#  -9.399999999999986
#  24.99999999999998

# define our prior for α as a uniform centered on -9.4 ( required for MCMC )
prior_α_distr = Distributions.Uniform(-20, 0)
prior_α(p) = Distributions.pdf(prior_α_distr,p)
α_lb = -20
α_ub = 0

# define our prior for β as a normal centered on 25 ( required for MCMC )
prior_β_distr = Distributions.Normal(25, 20)
prior_β(p) = Distributions.pdf(prior_β_distr,p)
β_lb = -35
β_ub = 65

# define our prior for σ as a exponential with mean of 2 ( required for MCMC )
prior_σ_distr = Distributions.Exponential(2)
prior_σ(p) = Distributions.pdf(prior_σ_distr,p)
σ_lb = 0
σ_ub = 10

# define the number of samples used in MCMC
# And create array p with that many elements (filled with 0.0)
n_samples = 1_000_000
p_α = zeros(n_samples)
p_β = zeros(n_samples)
p_σ = zeros(n_samples)
# Set the starting value to a random number between lb and ub
p_α[1] = rand(prior_α_distr)
while p_α[1] < α_lb || p_α[1] > α_ub
    p_α[1] = rand(prior_α_distr)
end

p_β[1] = rand(prior_β_distr)
while p_β[1] < β_lb || p_β[1] > β_ub
    p_β[1] = rand(prior_β_distr)
end

p_σ[1] = rand(prior_σ_distr)
while p_σ[1] < σ_lb || p_σ[1] > σ_ub
    p_σ[1] = rand(prior_σ_distr)
end

log10(x) = log(10,x)

# The likelihood function which returns the probability,
# given a fixed mu and a fixed sigma, of the sequence of datapoints
function loglikelihood(data::Matrix,alpha::Real,beta::Real,sigma::Real)
    numofdatapoints = size(data,1)
    result=0.0
    for k = 1:numofdatapoints
        # μ = α + β * Height
        mu = alpha + beta * data[k,1]
        result += log10(  pdf(Normal(mu, sigma),data[k,2])  )
    end
    return result
end

JumpingWidth = 0.1
for i in 2:n_samples
    # p_new has a value around the vicinity of p[i-1]
    p_α_new = rand(  Normal( p_α[i-1] , JumpingWidth*(α_ub-α_lb) )  )
    # make sure p_α_new is between α_lb and α_ub
    if p_α_new < α_lb
        p_α_new = α_lb + abs( p_α_new - α_lb )
    elseif p_α_new > α_ub
        p_α_new = α_ub - abs( p_α_new - α_ub )
    end

    # p_new has a value around the vicinity of p[i-1]
    p_β_new = rand(  Normal( p_β[i-1] , JumpingWidth*(β_ub-β_lb) )  )
    # make sure p_β_new is between β_lb and β_ub
    if p_β_new < β_lb
        p_β_new = β_lb + abs( p_β_new - β_lb )
    elseif p_β_new > β_ub
        p_β_new = β_ub - abs( p_β_new - β_ub )
    end

    p_σ_new = rand(  Normal( p_σ[i-1] , JumpingWidth*(σ_ub-σ_lb) )  )
    # make sure p_σ_new is between σ_lb  and σ_ub
    if p_σ_new < σ_lb
        p_σ_new = σ_lb + abs( p_σ_new - σ_lb )
    elseif p_σ_new > σ_ub
        p_σ_new = σ_ub - abs( p_σ_new - σ_ub )
    end
    " q0 is posterior0 = likelihood0 * prior0        "
    " q1 is posterior1 = likelihood1 * prior1        "
    logpriorα0 = log10(  prior_α(p_α[i-1])  )
    logpriorβ0 = log10(  prior_β(p_β[i-1])  )
    logpriorσ0 = log10(  prior_σ(p_σ[i-1])  )
    logpriorα1 = log10(  prior_α(p_α_new)   )
    logpriorβ1 = log10(  prior_β(p_β_new)   )
    logpriorσ1 = log10(  prior_σ(p_σ_new)   )

    q0 = loglikelihood(data, p_α[i-1], p_β[i-1], p_σ[i-1]) + logpriorα0 + logpriorβ0 + logpriorσ0
    q1 = loglikelihood(data, p_α_new, p_β_new, p_σ_new) + logpriorα1 + logpriorβ1 + logpriorσ1
    # The value of p[i] depends on whether the
    # random number is less than q1/q0
    if log10(rand()) < q1-q0
        p_α[i] = p_α_new
        p_β[i] = p_β_new
        p_σ[i] = p_σ_new
    else
        p_α[i] = p_α[i-1]
        p_β[i] = p_β[i-1]
        p_σ[i] = p_σ[i-1]
    end


end

# Next we need to display the histogram of the array p
# this is the shape of the posterior produced by MCMC
histogram(p_α,legend=false,title="Posterior of alpha") |> display
histogram(p_β,legend=false,title="Posterior of beta") |> display
histogram(p_σ,legend=false,title="Posterior of sigma") |> display

Christopher_Fisher · November 30, 2019, 11:06am

Thanks for putting these examples together. In order to achieve better numerical stability, I recommend using logpdf(...) instead of log10(pdf(...)). The formulas for logpdf(...) are usually designed to be more numerically stable than log(pdf(...)).

StevenSiew · December 1, 2019, 5:34am

This is last post. There are two more innovations

standardise the rawdata by mean and standard deviation of the rawdata
use a burned in sampling of 1000 samples

using Distributions, Statistics
using StatsPlots

# Here are the heights and weights of babies (in metres and kilograms)
# Assuming the weight of the babies are generated by a biological
# process with a normal distribution with mean μ and standard dev σ
# (Weight - weight_mean)/rel_weight_stddev ~ Normal(μ,σ)
#          μ = α + β (Height - Height_mean)/rel_height_stddev
#
# Weight = weight_mean + rel_weight_stddev * Normal(μ,σ)
#      μ = α + β (Height - Height_mean)/rel_height_stddev
# find the posterior distribution of those three parameters α , β , σ
rawdata = [
0.46 2.5;
0.51 3.4;
0.54 4.4;
0.57 5.1;
0.60 5.6;
0.62 6.1;
0.64 6.4;
0.65 6.7;
0.66 7.0;
0.68 7.2]

height_mean = mean(rawdata[:,1])
weight_mean = mean(rawdata[:,2])
relativedata = rawdata .- [height_mean weight_mean]
rel_height_stddev = std(relativedata[:,1])
rel_weight_stddev = std(relativedata[:,2])
data = relativedata ./ [rel_height_stddev rel_weight_stddev]

# data[rownum,1] is Height (in Z score)
# data[rownum,2] is Weight/Mass (in Z score)
# julia> data
# 10×2 Array{Float64,2}:
#  -1.85612    -1.85925
#  -1.15833    -1.29009
#  -0.739655   -0.657695
#  -0.320982   -0.215016
#   0.0976903   0.101184
#   0.376805    0.417384
#   0.65592     0.607103
#   0.795478    0.796823
#   0.935035    0.986543
#   1.21415     1.11302

# rough estimate of α and β
#julia> [ 1 0.0976; 1 0.376] \ [ 0.10; 0.417]
#2-element Array{Float64,1}:
# -0.011132183908045976
#  1.1386494252873562

# define our prior for α as a uniform centered on 0.0 ( required for MCMC )
prior_α_distr = Distributions.Uniform(-10, 10)
prior_α(p) = Distributions.pdf(prior_α_distr,p)
α_lb = -10
α_ub = 10

# define our prior for β as a normal centered on 1 ( required for MCMC )
prior_β_distr = Distributions.Normal(1, 1)
prior_β(p) = Distributions.pdf(prior_β_distr,p)
β_lb = -2
β_ub = 4

# define our prior for σ as a exponential with mean of 2 ( required for MCMC )
prior_σ_distr = Distributions.Exponential(1)
prior_σ(p) = Distributions.pdf(prior_σ_distr,p)
σ_lb = 0
σ_ub = 5

# define the number of samples used in MCMC
# And create array p with that many elements (filled with 0.0)
burnedinsamples = 1000
n_samples = 1_000_000 + burnedinsamples
p_α = zeros(n_samples)
p_β = zeros(n_samples)
p_σ = zeros(n_samples)
# Set the starting value to a random number between lb and ub
p_α[1] = rand(prior_α_distr)
while p_α[1] < α_lb || p_α[1] > α_ub
    p_α[1] = rand(prior_α_distr)
end

p_β[1] = rand(prior_β_distr)
while p_β[1] < β_lb || p_β[1] > β_ub
    p_β[1] = rand(prior_β_distr)
end

p_σ[1] = rand(prior_σ_distr)
while p_σ[1] < σ_lb || p_σ[1] > σ_ub
    p_σ[1] = rand(prior_σ_distr)
end

log10(x) = log(10,x)

# The likelihood function which returns the probability,
# given a fixed mu and a fixed sigma, of the sequence of datapoints
function loglikelihood(data::Matrix,alpha::Real,beta::Real,sigma::Real)
    numofdatapoints = size(data,1)
    result=0.0
    for k = 1:numofdatapoints
        # μ = α + β * Height
        mu = alpha + beta * data[k,1]
        result += log10(  pdf(Normal(mu, sigma),data[k,2])  )
    end
    return result
end

JumpingWidth = 0.1
for i in 2:n_samples
    # p_new has a value around the vicinity of p[i-1]
    p_α_new = rand(  Normal( p_α[i-1] , JumpingWidth*(α_ub-α_lb) )  )
    # make sure p_α_new is between α_lb and α_ub
    if p_α_new < α_lb
        p_α_new = α_lb + abs( p_α_new - α_lb )
    elseif p_α_new > α_ub
        p_α_new = α_ub - abs( p_α_new - α_ub )
    end

    # p_new has a value around the vicinity of p[i-1]
    p_β_new = rand(  Normal( p_β[i-1] , JumpingWidth*(β_ub-β_lb) )  )
    # make sure p_β_new is between β_lb and β_ub
    if p_β_new < β_lb
        p_β_new = β_lb + abs( p_β_new - β_lb )
    elseif p_β_new > β_ub
        p_β_new = β_ub - abs( p_β_new - β_ub )
    end

    p_σ_new = rand(  Normal( p_σ[i-1] , JumpingWidth*(σ_ub-σ_lb) )  )
    # make sure p_σ_new is between σ_lb  and σ_ub
    if p_σ_new < σ_lb
        p_σ_new = σ_lb + abs( p_σ_new - σ_lb )
    elseif p_σ_new > σ_ub
        p_σ_new = σ_ub - abs( p_σ_new - σ_ub )
    end
    " q0 is posterior0 = likelihood0 * prior0        "
    " q1 is posterior1 = likelihood1 * prior1        "
    logpriorα0 = log10(  prior_α(p_α[i-1])  )
    logpriorβ0 = log10(  prior_β(p_β[i-1])  )
    logpriorσ0 = log10(  prior_σ(p_σ[i-1])  )
    logpriorα1 = log10(  prior_α(p_α_new)   )
    logpriorβ1 = log10(  prior_β(p_β_new)   )
    logpriorσ1 = log10(  prior_σ(p_σ_new)   )

    q0 = loglikelihood(data, p_α[i-1], p_β[i-1], p_σ[i-1]) + logpriorα0 + logpriorβ0 + logpriorσ0
    q1 = loglikelihood(data, p_α_new, p_β_new, p_σ_new) + logpriorα1 + logpriorβ1 + logpriorσ1
    # The value of p[i] depends on whether the
    # random number is less than q1/q0
    if log10(rand()) < q1-q0
        p_α[i] = p_α_new
        p_β[i] = p_β_new
        p_σ[i] = p_σ_new
    else
        p_α[i] = p_α[i-1]
        p_β[i] = p_β[i-1]
        p_σ[i] = p_σ[i-1]
    end


end

# Finally we must not forget to remove the burned in samples
deleteat!(p_α,1:burnedinsamples)
deleteat!(p_β,1:burnedinsamples)
deleteat!(p_σ,1:burnedinsamples)

# Next we need to display the histogram of the array p
# this is the shape of the posterior produced by MCMC
histogram(p_α,legend=false,title="Posterior of alpha") |> display
histogram(p_β,legend=false,title="Posterior of beta") |> display
histogram(p_σ,legend=false,title="Posterior of sigma") |> display

dpsanders · December 1, 2019, 5:57pm

I would strongly suggest making functions for each section of the code. This should be both easier to digest and run much faster!

StevenSiew · December 2, 2019, 11:52pm

dpsanders,

Thank you for your input. I may rework the code and put them in functions for functional style programming. As these code are for beginners to understand. Faster performance is not an objective, rather ability for beginners to read and understand the code is the number one objective

dpsanders · December 4, 2019, 11:01pm

In my opinion, having the code in digestible, small chunks inside functions with well chosen names makes any code much more readable and understandable.

cscherrer · December 5, 2019, 1:37am

In Soss, you would do

julia> using Soss

julia> data = [1, 0, 1, 0, 1, 1, 0, 1, 1];

julia> coinflip = @model N begin
           p ~ Beta(1,1)
           data ~ Bernoulli(p) |> iid(N)
       end;

Then (if you didn’t have data) you could generate some with

julia> rand(coinflip(N=10))
(N = 10, p = 0.5585410994609438, data = Bool[0, 0, 0, 1, 1, 1, 0, 1, 1, 0])

Now for sampling

julia> post = dynamicHMC(coinflip(N=10), (data=data,)) |> particles
(p = 0.649 ± 0.13,)

StevenSiew · December 7, 2019, 12:08pm

I have simplify the source code and place it inside a module called SimpleMCMC.jl
I also also implemented Gibbs_MCMC in addition to MetropolisHastings_MCMC
Hopefully this source code would allow beginners and newbies to figure out how basic MCMC works behind the scenes.

FILE: SimpleMCMC.jl

module SimpleMCMC
    export MetropolisHastings_MCMC
    export Gibbs_MCMC
    export describe_paramvec

    using Distributions, StatsPlots, Statistics, Printf

    # Version 6
    #  v6: Add Gibbs_MCMC and change module name from MHMCMC to SimpleMCMC
    #  v5: turn it into a module
    #  v4: remove prior from Parameter struct and the struct itself
    #  v3: create JumpingWidthVec, lbVec, ubVec  to improve efficiency
    #  v2: Remove q0 from the main loop to improve efficiency
    #  v1: Turn it into a version that uses pure function(s)

    function MetropolisHastings_MCMC(data,lop::Vector{Distribution},loglikelihood;samples,burnedinsamples,JumpingWidth=0.01)
        # Find the number of parameters
        numofparam = length(lop)
        # calc the n_samples
        n_samples = samples + burnedinsamples
        # Create a parameter Matrix p with n_samples rows and numofparam cols
        p = zeros(n_samples,numofparam)
        # JumpingWidthVector, lowerboundVector and upperboundbVector
        JumpingWidthVec = zeros(numofparam)
        lbVec = zeros(numofparam)
        ubVec = zeros(numofparam)
        # Create a Vector of our logprior function
        logpriorVec = Array{Function}(undef,numofparam)

        # Set the starting value to a random number between lb and ub
        for k = 1:numofparam  # for each parameter k
            # Set the starting value to a random number between lb and ub
            p[1,k] = rand(lop[k])

            # get 100 random values and find the practical max and practical min
            HundredRandValues = rand(lop[k],100)
            prac_min = minimum(HundredRandValues)
            prac_max = maximum(HundredRandValues)
            JumpingWidthVec[k] = JumpingWidth * (prac_max - prac_min)

            lbVec[k] = minimum(lop[k])
            ubVec[k] = maximum(lop[k])

            # Array of function prior(x) = pdf(distribution,x)
            logpriorVec[k] = x->logpdf(lop[k],x)
        end

        p_old = vec(p[1,:])
        q0 = loglikelihood(data,p_old...) + sum([logpriorVec[k](p_old[k]) for k = 1:numofparam ])

        # prepare the p_new array
        p_new = zeros(numofparam)
        # Main loop for MetropolisHastings MCMC
        for i = 2:n_samples
            # p_new has a value around the vicinity of p[i-1]
            for k = 1:numofparam  # for each parameter k
                # p_new has a value around the vicinity of p[i-1]
                p_new[k] = rand(  Normal( p[i-1,k] , JumpingWidthVec[k] )  )
                # make sure p_new is between lb and ub
                if p_new[k] < lbVec[k]
                    p_new[k] = lbVec[k] + abs( p_new[k] - lbVec[k] )
                elseif p_new[k] > ubVec[k]
                    p_new[k] = ubVec[k] - abs( p_new[k] - ubVec[k] )
                end
            end

            # Calc the two posterior
            " q0 is posterior0 = likelihood0 * prior0        "
            " q1 is posterior1 = likelihood1 * prior1        "
            q1 = loglikelihood(data,p_new...) + sum([logpriorVec[k](p_new[k]) for k = 1:numofparam ])
            # The value of p[i] depends on whether the
            # random number is less than q1/q0
            p[i,:] .= log(rand()) < q1-q0 ? (q0=q1; p_old=p_new[:]) : p_old
        end
        # Finally we must not forget to remove the burned in samples
        return p[(burnedinsamples+1):end,:]
    end

    function Gibbs_MCMC(data,lop::Vector{Distribution},loglikelihood;samples,burnedinsamples)
        # Find the number of parameters
        numofparam = length(lop)
        # calc the n_samples
        n_samples = samples + burnedinsamples
        # Create a parameter Matrix p with n_samples rows and numofparam cols
        p = zeros(n_samples,numofparam)
        # Create a Vector of our logprior function
        logpriorVec = Array{Function}(undef,numofparam)
        # Number of holes to drill
        numofdrillholes = 1001
        drillhole = Array{Float64}(undef,numofdrillholes)
        invcdf = Array{Float64}(undef,numofdrillholes+1)
        # prepare the vectors: startposvec stopposvec chunksizevec
        startposvec = zeros(numofparam)
        stopposvec = zeros(numofparam)
        chunksizevec = zeros(numofparam)

        # Set the starting value to a random number between lb and ub
        for k = 1:numofparam  # for each parameter k
            # Set the starting value to a random number between lb and ub
            p[1,k] = rand(lop[k])

            # get 400 random values and find the practical max and practical min
            FourHundredRandValues = rand(lop[k],400)
            startposvec[k] = minimum(FourHundredRandValues)
            stopposvec[k]  = maximum(FourHundredRandValues)
            chunksizevec[k] = (stopposvec[k]-startposvec[k])/(numofdrillholes-1)

            # Array of function prior(x) = pdf(distribution,x)
            logpriorVec[k] = x->logpdf(lop[k],x)
        end

        # Main loop for Gibbs MCMC
        for i = 2:n_samples
            # prepare the p_new vector
            p_new = vec(p[i-1,:])
            # Select a new value for each parameter of p_new
            for k = 1:numofparam  # for each parameter k
                # Calculate chunk size
                startpos,stoppos,chunksize = startposvec[k],stopposvec[k],chunksizevec[k]
                invcdf[1] = 0.0
                # Start drilling the drill holes for parameter k
                for n = 1:numofdrillholes
                    p_new[k] = startpos + (n-1) * chunksize
                    drillhole[n] = exp(  loglikelihood(data,p_new...)  )
                    invcdf[n+1] = invcdf[n] + drillhole[n]
                end
                invcdf /= invcdf[end]
                # Now we perform an inverse CDF sampling
                x = 0.0
                while x == 0.0
                    x = rand()  # Make sure x is not zero
                end
                counter = 2
                while !(invcdf[counter-1] < x <= invcdf[counter])
                    counter += 1
                end
                counter -= 1
                # now find the kth parameter value
                pos = startpos + (counter-1) * chunksize
                # This is the value we are going to jump to
                p_new[k] = pos
                p[i,k] = pos
            end
        end
        # Finally we must not forget to remove the burned in samples
        return p[(burnedinsamples+1):end,:]
    end

    function describe_paramvec(name,namestr,v::Vector)
        m,s,v_sorted = mean(v),std(v),sort(v)
        pc5 = v_sorted[Int64(round(0.055*length(v_sorted)))]
        pc94 = v_sorted[Int64(round(0.945*length(v_sorted)))]
        println(
            "Parameter $(name)   mean ",@sprintf("%6.3f",round(m,sigdigits=3)),
            "   sd ",@sprintf("%6.3f",round(s,sigdigits=3)),
            "   5.5% ",@sprintf("%6.3f",round(pc5,sigdigits=3)),
            "   94.5% ",@sprintf("%6.3f",round(pc94,sigdigits=3))
        )
        histogram(v,legend=false,title="Posterior of $(namestr)") |> display
        return (m,s,pc5,pc94)
    end

end

Here is an example code on how to use it

using Distributions, StatsPlots, Statistics
using SimpleMCMC

function MCMC_example()

    # Here are the weights of the students (in kilograms) in a classrom
    # Assuming the weight of the students are generated by a biological
    # process with a normal distribution with mean μ and standard dev σ
    # find the posterior distribution of those two parameters
    data = [ 55.8, 64.1, 61.2, 55.2, 58.0, 69.8, 53.2, 70.0, 63.3, 63.5 ]

    # define our prior for μ as a normal centered on 60 ( required for MCMC )
    μ = Normal(60, 10)
    # define our prior for σ as a exponential with mean of 20 ( required for MCMC )
    σ = Exponential(20)

    # Put all our parameters into a list
    ListOfParams = Distribution[μ,σ]

    # The loglikelihood function which returns the log(probability),
    # given a fixed mu and a fixed sigma, of the sequence of datapoints
    function loglikelihood(data::Vector,mu,sigma)
        numofdatapoints = length(data)
        result=0.0
        for k = 1:numofdatapoints
            result += logpdf(Normal(mu,sigma),data[k])
        end
        return result
    end

    p = MetropolisHastings_MCMC(
            data,
            ListOfParams,
            loglikelihood,
            samples=1_000_000,
            burnedinsamples=5000
    )

    # Next we need to display the histogram of the column p[:,1]
    # this is the shape of the posterior produced by MCMC
    vec_μ=vec(p[:,1])  # Convert a Matrix column into a 1-D Vector
    vec_σ=vec(p[:,2])  # Convert a Matrix column into a 1-D Vector
    describe_paramvec("μ","mu",vec_μ)
    describe_paramvec("σ","sigma",vec_σ)
    histogram2d(vec_μ,vec_σ,nbins=100,xlabel="mu",ylabel="sigma")
end

# We call the main function to kick off the example
MCMC_example()

Here is the text output

Starting Julia...
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _  |  |
  | | |_| | | | (_| |  |  Version 1.3.0 (2019-11-26)
 _/ |\__ _|_|_|\__ _|  |  Official https://julialang.org/ release
|__/                   |

[ Info: Precompiling SimpleMCMC [top-level]
Parameter μ   mean 61.300   sd  2.130   5.5% 57.900   94.5% 64.700
Parameter σ   mean  6.680   sd  1.850   5.5%  4.440   94.5%  9.960

Here are the three graphs produced by the program

josimar · March 31, 2023, 2:21am

Thank you very much for writing this detailed example of the MCMC implementation, it helped a lot to understand it better.

Would it be possible to write the same example (the one of the weights of the students) in Turing and post it here? That would help a lot for beginners (like me) to understand how Turing works and compare with what was shown here.

mcreel · March 31, 2023, 8:48am

For another source of teaching oriented code, the BayesExample1(2,3).jl files in Econometrics/Examples/Bayesian at main · mcreel/Econometrics · GitHub may be useful. The first example computes the posterior directly, the second uses basic MH MCMC, and the third use Turing and NUTS, all for the same one parameter model.

Topic		Replies	Views
Function rand() or sample() for MCMC sampling and similar? Probabilistic Programming	19	1721	September 23, 2019
Trouble translating a simple PyMC3 model to Turing New to Julia turing	39	5215	November 23, 2021
Gibbs sampler works in R doesn't in Julia (full Julia code provided) please help General Usage	22	2590	October 25, 2018
Custom likelihoods in Turing.jl General Usage	15	3783	October 26, 2018
MCMC landscape Statistics question	99	13436	July 2, 2021

A simple MCMC code for beginners for learning

Related topics