Custom likelihood with Turing.jl

Hi all,

I’d like to use Turing.jl to sample from a custom likelihood function. I’ve seen this post here and the DynamicHMCexamples.jl package, which gives very nice tutorials for doing this using DynamicHMC, but I would also like to try this using the Turing.jl framework with the AdvancedHMC.jl backend. (Speaking of which, what is the difference between AdvancedHMC.jl and DynamicHCM.jl?)

My specific model has the following stipulations- I want to sample from a custom multidimensional likelihood function, with no prior on the parameters, but the parameters have a finite support. The advanced usage Turing documentation page provides some decent details, but a few things are missing, specifically details about defining the support of the likelihood and mapping samples onto the support. Section 3.1 says to look at “transforms.jl” for some examples, but the link is broken. I’m wondering if Turing.jl has something like the functionality of TransformVariables.jl for DynamicHMC.jl.

A general sketch of setting up a model in Turing.jl to do the above would be a huge help. Thanks!

Parts of the documentation are unfortunately not up to date. I’m working on improving it in the next weeks.

You can use a custom multivariate distribution as you pointed out. Turing uses the Bijectors.jl package for transformations of constrained distribution. So you would need to use an existing transformation or write a new one if necessary. @torfjelde could you help?

What you need to do is specify the lower and upper bounds of your parameters with your priors. I think you can simply use Uniform(lb, ub) in your use case. Here is a fairly straightforward example of specifying a custom likelihood distribution in Turing. Here is example code for running the linked model.

using Distributions, Turing, Random, Parameters
ProjDir = @__DIR__
cd(ProjDir)
include("LNR.jl")

@model model(data, Nr) = begin
    μ = Array{Real,1}(undef,Nr)
    μ ~ [Normal(0,3)]
    s ~ Uniform(0, pi/2)
    σ = tan(s)
    #σ ~ Truncated(Cauchy(0, 1), 0, Inf)
    for i in 1:length(data)
        data[i] ~ LNR(μ, σ, 0.0)
    end
end

Random.seed!(343)
Nreps = 50
error_count = fill(0.0, Nreps)
Nr = 3
Nobs = 10


μ = rand(Normal(0, 3), Nr)
σ = rand(Uniform(.2, 1))
dist = LNR(μ=μ, σ=σ, ϕ=0.0)
data = rand(dist, Nobs)
chain = sample(model(data, Nr), NUTS(1000, .8), 2000)

I know there are differences in terms of the NUTS configuration. AdvancedHMC uses Stan’s default configuration, whereas DynaicHMC uses something else. From personal experience, I found DynamicHMC to out perform AdvancedHMC in terms of effective sample size in a limited number of cases, but AdvancedHMC seems to have higher ESS in general. Tamas might be able to tell you more about the specifics.

AFACT DynamicHMC and AdvancedHMC use basically the same almost-current NUTS algorithm after the initial point and stepsize have been established. This algorithm is mostly what one would find in Stan, except for the recent (and somewhat controversial, at least from a procedural sense) updates about acceptance rate tree weighting.

I am surprised to see a systematic difference. For “nice” posteriors (without crazy variations in curvature) almost all tuned HMC should work the same. As for the rest, it is essentially random whether the tuning adapts to some region with slightly different properties. Even in Stan, occasionally 1 out of 3–4 chains can be an outlier with bad adaptation.

I would be cautious about ESS alone, it is easy to come up with posteriors which have fine ESS but never mix well (eg a mixture of two peaks in 5+ dimensions with a very low “valley” between them, NUTS will almost never cross that in practice). I think that with today’s multi-core machines, a more robust approach is to do a preliminary run with 3–5 chains, estimate some overall variance in \mathbb{R}^n, then add fat tails to that and come up with 10–100 random starting points and rerun chains from each, with independent adaptation. I don’t see this done in practice, even though it is sometimes recommended and should be easy to automate.

2 Likes

I’m not sure I understand @depasquale properly.

I want to sample from a custom multidimensional likelihood function, with no prior on the parameters

When you say you want to “sample from a custom multidimensional likelihood”, do you mean you want to sample the “observations” given some parameters? Or do you already have observations, and want to find the paramters, i.e. something like

@model custom_ll(x) = begin
    θ ~ Flat() 
    x ~ CustomLikelihood(θ)
end

?

Assuming you want the above Turing.Model definition but with Flat instead being constrained to lie in a specific interval, you can just use a Uniform distribution as suggested by @Christopher_Fisher :+1: And if you need multiple parameters, you can take a use a Distributions.Product of such Uniform distributions.

For the heck of it, you can also use Bijectors.jl (though not recommended in this case):

using Bijectors

# b: ℝ → (a, b)
b = Logit(a, b)

# Sample from `Flat`, transform to lie in interval (a, b)
constrained_flat = transformed(Flat(), b)  # <= assumes `Flat` is defined

a ≤ rand(constrained_flat) ≤ b # => true
1 Like

Here is a feature overview of different NUTS implementations and AdvancedHMC. There are more details in Kai’s AABI paper as well as various empirical evaluations.

6 Likes