MCMC Sampling Method Overview/Comparison?

Is there a reference that compares the different sampling techniques available for Turing.jl? I don’t see a consistently preferred chain and it’s not clear to me (a beginner bayesian) why one would prefer a given sampler over another?

I found this performance comparison - though three years ago is a long time ago for the ecosystem.

I have also read from various posts online that it’s difficult to determine a preferred sampler because which performs best (sampling accuracy, speed) will depend on problem specifics. Could folks share what some of the heurisitcs are for different samplers?


Another question: does Turing support sampling on GPU (e.g. CUDA)? I have played around with it but not been successful (e.g. sample below).

However, posts like this one seem to indicate that it’s possible (and should just work out of the box?).

Attempted CUDA code
@model function model_gaussian2(claims,n)
	μ ~ Normal(0.05,0.1)
	σ ~ Exponential(0.25)
	claims .~ Binomial.(n,logistic.(μ))

where claims and n are CuArrays of integers:

mg = model_gaussian2(CuArray(,CuArray(claims_summary.n))
cg = sample(mg, NUTS(), 500)

Resutls in:

InvalidIRError: compiling kernel #broadcast_kernel#17(CUDA.CuKernelContext, CUDA.CuDeviceVector{Float64, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(StatsAPI.loglikelihood), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceVector{Distributions.Binomial{Float64}, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CUDA.CuDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) resulted in invalid LLVM IR

Reason: unsupported call through a literal pointer (call to .text)

It depends on what and how you GPU. In that code, Turing would never see a GPU array, so as long as the code is ForwardDiff compatible it would work fine.

1 Like

I’m not sure I follow what you mean? The @model is given CuArrays for each of the arguments.

“work fine” as in it would run without error on CPU, or it should have GPU-accelerated sampling?

No, not in the correct version of that code.

No error in the GPU-accelerated form.

Thanks, for the clarification. Can you point me in the right direction with either an example online of how to do GPU-accelerated sampling or what I need to change in the code I have above?

I’m not sure. That’s a very different type of GPU parallelism from the other example, and it would require deeper support from within Turing.