Improving metropolis-hastings code

Recently i’m looking for different ways to make the metropolis-hastings algorithm better specially when it comes to rate of the accepted values and the performance, specially if i’m writing this code manually without any package, for example this code

using Distributions
using Random

function tar(x)
    if x< 0
        return 0
    else
        return exp(-x)
    end
end

x= fill(0.0,1000)
x[1]= 3.0
y=[]
for i=2:1000
    currentx= x[i-1]
    proposedx= currentx + rand(Normal(0,1),1)[1]
    
    A= tar(proposedx)/tar(currentx)
    if rand()< A
        x[i]= (proposedx)
        push!(y,x[i])
    else
        x[i]= currentx
    end
end 

what are the possible other tools that i can increase the performance of this code for example specially in terms of increasing the rate of the accepeted values

If you need MH maybe use a mature library like https://github.com/TuringLang/AdvancedMH.jl?

If this is for educational purposes, a couple of things stand out immediately:

  • Don’t use untyped arrays like y = [] - in this case you know there’s only going to be Float64 values stored, so you can do y = Float64[]
  • rand(Normal(0, 1), 1)[1] seems unnecessarily convoluted, and creates a 1-element vector only to access its only element. You can just do rand(Normal(0, 1)), and even better create the normal distribution outside the loop. I would probably forego the whole Distributions dependency and just use randn()
5 Likes

I’ve had to implement a lot of MH samplers by hand. IME, the simplest way to dramatically improve performance (for many, though definitely not all models) is to implement a simple adaptive metropolis algorithm, like from Haario et al; also, see Andrieu & Thoms for a very good larger review article on adaptive MCMC algorithms in general.

If you use OnlineStats.jl, it’s not hard to keep and track the necessary statistics.

7 Likes

i took a look at AdvancedMH, and i don’t know if i can use it for a customized distribution that generate a set of arrays for example [[1,0],[0,1],[1,1],[1,2]] and for a target function that is on the form of log(prod(x))) where x is an array generated from our distribution how can i deal with it using AdvancedMH

The performance tips section of the manual contain many great tips that will help you speed up this particular example
https://docs.julialang.org/en/v1/manual/performance-tips/

Pay particular attention to global variables and untyped arrays.

1 Like

I am not expert in Julia.
But if you mean increase the acceptance rate, just by naively look at your code, you can make your proposed step size smaller, like

proposedx= currentx +  0.1 * rand(Normal(0,1),1)[1]

The factor of 0.1 make your proposed move smaller so it will for sure increase your acceptance rate. However it will make your speed of exploring the configuration space slower.
In reality the acceptance rate between 5 - 80 percent is OK.

If you want to really improve your MH algorithm, you would need to possibly change your proposed move (your proposedx) and the corresponding acceptance rate expression which is the A in your code.

Besides, in your code, perhaps you can move

rand(Normal(0,1),1)[1]

outside the loop. since you will for sure generate this random number for your proposed move for like 1000 times, you can generate an 1000 element random number array outside of the loop first. In the loop your just need to use the elements of the array sequentially one by one.

By the way, you could check Kalos & Witlock’s very nice Monte Carlo book which you may find electronic version,

Chapter 3, page 40 you may find how to form your tar function, which is the lambda = 1 case I believe,


If I remember correctly, you can generate y according to Eq.(3.31). It automatically form the PDF of y which is Eq.(3.29). So you can directly form the distribution nicely without even using Metropolis. This could be the fastest way. Because you just do it.

In general, once you know your integrand, you could also add importance sampling, in order to reduce variance.

A like to use handwritten MH at times, it reads so nicely in Julia

using Distributions
using Random

function logtar(x)
    if x < 0
        return -Inf
    else
        return -x
    end
end

x = [3.0]
for i in 2:100000
    currentx = x[end]
    proposedx = currentx + rand(Normal(0,1))   
    l = logtar(proposedx) - logtar(currentx)
    if -randexp() < l
        currentx = proposedx
    end
    push!(x,currentx)
end 

mean(x), var(x)
mean(Exponential()), var(Exponential())
5 Likes

Yes, the Log trick @mschauer suggested is typically used and your code will be faster than the OP’s.
Most importantly, your way is more robust than OP’s. Because you do not have

 tiny number / tiny number 

issue in OP’s possible

A= tar(proposedx)/tar(currentx)

The replacement of / with - using Log trick is nice!

what if i have a distribution like this

using Distributions
struct OrthoNNDist <: DiscreteMultivariateDistribution
	x0::Vector{Float64}
	oc::Array{Float64,2}
	x1s::Array
	prob::Float64
    #return a new uniform distribution with all vectors in x1s orthogonal to oc
	function OrthoNNDist(x0::Vector{Float64}, oc::Array{Float64,2})
		x1s = []
		for i = 1:size(oc)[2]
			x1 = x0 + oc[:, i]
			if nonneg(x1)
				push!(x1s, x1)
			end
			x1 = x0 - oc[:, i]
			if nonneg(x1)
				push!(x1s, x1)
			end
		end
		new(x0, oc, x1s, 1.0/length(x1s))
	end
end

Base.length(d::OrthoNNDist) = length(d.x0)

Distributions.rand(d::OrthoNNDist) = rand(d.x1s)

Distributions.pdf(d::OrthoNNDist, x::Vector) = x in d.x1s ? d.prob : 0.0
Distributions.pdf(d::OrthoNNDist) = fill(d.prob, size(d.x1s))
Distributions.logpdf(d::OrthoNNDist, x::Vector) = log(pdf(d, x))

which is a customized distribution, how can i make it with smaller step knowing that all the generated arrays should be with integers values

If your distribution is very unique, see here. However, a distribution that creates an array of arrays seems a bit odd to me; is there any particular reason you’re using that? There’s a very good chance that you can implement this using one of the multivariate distributions included by default in Distributions.jl.

1 Like

I think you are confusing yourself a bit: your distribution is uniform on a finite number of elements of a set (which happen to be vectors in space). Ok, these vectors have some relation. But to sample them, just assign a number 1:size(oc) each and sample the number uniformly. If you insist, you can also sample the uniform number using Metropolis-Hastings.

4 Likes

If you could post the math equation of your distribution as a picture, it could perhaps make us more easy to understand your distribution.

No matter what distribution is, it can be written as a function f(x), then just do regular metropolis as you would do for any distribution function f(x). You can always propose whatever move you want to.

Perhaps to begin with, you could propose the simplest move, which is symmetric move, like the proposed probability

T(xold -> xnew) = T(xnew ->xold)

Like, you can always propose your corresponding symmetric move, that is, propose xnew by the simplest way,

xnew = xold + stepsize * random number

(As you can see, the probability for xnew to jump to xold, or for xold to jump to xnew, is the same. So it is symmetric)

You could make stepsize 0.1 or whatever factor, and see how your acceptance rate change as the stepsize change. Then you can use the corresponding acceptance expression,

    A = min[1, f(xnew)/f(xold)]

or the Log trick by @mschauer.

But I am not sure what do you mean by step, it looks like you the step you are say and the stepsize I am saying are not the same thing.

As @ParadaCarleton, @mschauer suggested, you may use a package, or if you can directly sample from your distribution and you just sample from it, you do not have to do MH.

MH is used to form a target distribution which, you do not know how to directly sample from it.

Hopefully your distribution is normalized to 1. or you can find a way such that the normalization factor can be cancelled. Otherwise you may face the unknown normalization factor issue.

If you need to write your own algorithm (I think it is very good for you), you may find something useful in Chapter 3 in the Kalos&Whitlock MC book which is a concise and good book.

1 Like

i’ve tried Turing perhaps i didn’t know how to use it because it didn’t work, i have this distribution and a target function, let’s call it target(x::vector) i just wanna see a minimum example to use Turing for a customized distribution because in the doc there is nothing there is only an explanation about how to create a customized distribution.

i didn’t quite understand what you said can you clarify if that’s possible, when i said the step, i meant the jumping step taken from the current value to the proposed value

yeah i meant the step-size, the distribution that i mentioned it is asymmetric, and where can i change the step-size in that distribution that i have

It does not matter whether the target distribution is symmetric or not, target distribution can be anything.
If you mean by the jumping step size, it is just your

proposedx= currentx + rand(Normal(0,1),1)[1]

just make it

 proposedx= currentx + factor * rand(Normal(0,1),1)[1]

the factor can be anything, 0.1, 0.001, 10, 100, etc, whatever you want, just adjust it make sure your acceptance rate is between like 5-80%. In fact this factor can be dynamically adjusted during your MH and it will always converge to your whatever target distribution. Because detailed balance is always obeyed.

Furthermore, if like, you configuration X is 3d or more dimension, and if your prose X’ = (xnew, ynew, znew) and your acceptance is too small. You can move your coordinates one at a time. Like move x to xnew and judge, then move y coordinate to ynew and judge, then z to znew, etc, continue. Move one coordinate at a time is also valid.

Also, do not forget to decorrelate your samples when using MH. There is something called autocorrelation time, you can find the content in any MC book.

2 Likes

To follow-on with CRquantum’s last post, there are several papers on “optimal” acceptance rates for MH. Some papers (e.g., here) have found that for a not-too-narrow class of models, an acceptance rate of about 23.4% is often (though certainly not always) optimal. In fact, achieving that acceptance rate is part of some of the adaptive MH algorithms I linked earlier.

2 Likes

In your example, the step is “going from one vector to the next”, there is no freely choosable stepsize, just a freely choosable proposal distribution on {x1, …, xd} corresponding to choose a number between 1 and d. That’s what I thought is the confusion.

2 Likes

i would really appreciate if there is any minimum example for moving the coordinates one at a time it sounds like a good idea

that’s why i was wondering how can i implement the stepsize if i’m sampling vectors, because we can only move by foctor of 1.