MCMC Sampling Method Overview/Comparison?

Another question: does Turing support sampling on GPU (e.g. CUDA)? I have played around with it but not been successful (e.g. sample below).

However, posts like this one seem to indicate that it’s possible (and should just work out of the box?).

Attempted CUDA code
@model function model_gaussian2(claims,n)
	μ ~ Normal(0.05,0.1)
	σ ~ Exponential(0.25)
	
	claims .~ Binomial.(n,logistic.(μ))
end

where claims and n are CuArrays of integers:

mg = model_gaussian2(CuArray(claims_summary.claims),CuArray(claims_summary.n))
cg = sample(mg, NUTS(), 500)

Resutls in:

InvalidIRError: compiling kernel #broadcast_kernel#17(CUDA.CuKernelContext, CUDA.CuDeviceVector{Float64, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(StatsAPI.loglikelihood), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceVector{Distributions.Binomial{Float64}, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{CUDA.CuDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) resulted in invalid LLVM IR

Reason: unsupported call through a literal pointer (call to .text)