Custom likelihoods in Turing.jl


#1

Hi all-

I have two related questions. First, how do I implement a custom likelihood in turing.jl? Unfortunately, I was unable to implement a simple example based on the documentation, which is probably impeding my ability to identify problems in the second question: How to I implement a custom likelihood that has two potential complications? In my particular use case, my model has vectors of related parameters. This is convenient because the vector may have different sizes in different applications and it allows vectorization of certain operations. In some versions of the model, I would like to fix some elements in the vector, while allowing others to update with the data (see model definition below). The other complicating factor in my use case is that the likelihood requires computationally intensive operations, which cause it to scale linearly with the number of data. Thus, its dramatically faster to perform those operations once and then loop over the data to evaluate the loglikelihood. Otherwise, the model will run 2-3 orders of magnitude slower in my cases.

Below, I posted a toy example with essential features that are used in a more realistic application. Any help would be greatly appreciated.

using Turing,Distributions
import Distributions: pdf, logpdf

#In most applications, i need the parameters to be vectors
mutable struct mydistribution{T} <: ContinuousUnivariateDistribution
    ν::T
    α::T
end

function pdf(d::mydistribution,T::AbstractArray)
    #do some expensive computations
    likelihoods = fill(0.0,length(T))
    for (i,t) in enumerate(T)
        likelihoods[i] = sum(@. .5*pdf(Normal.(d.ν,d.α),t))
    end
    #return array of likelihoods
    return likelihoods
end

#Returns sum of loglikelihood for all data
logpdf(d::mydistribution,T::AbstractArray) = sum(log.(pdf(d,T)))

@model myModel(data) = begin
    α = fill(.85,3) #fixed parameter
    ν1 ~ Normal(15,5)
    ν2 ~ Normal(15,5)
    N = length(data)
    #fix the third element to zero
    data ~ mydistribution([ν1,ν2,0.0],α)
end

# Settings of the Hamiltonian Monte Carlo (HMC) sampler.
iterations = 1000
ϵ = 0.05
τ = 10
data = rand(Normal(0,1),50)
chain = sample(myModel(data), HMC(iterations, ϵ, τ))


I receive the following error:


MethodError: no method matching mydistribution(::Array{Flux.Tracker.TrackedReal{Float64},1}, ::Array{Float64,1})
Closest candidates are:
  mydistribution(::T, !Matched::T) where T at /home/dfish/Projects/ACT-R-Likelihood/FFT Models/LogNormalACTR/temp.jl:6
macro expansion at compiler.jl:56 [inlined]
#myModel_model#18(::Array{Float64,1}, ::Function, ::Turing.VarReplay.VarInfo, ::Turing.Sampler{HMCDA{Any}}) at utils.jl:297
myModel_model(::Turing.VarReplay.VarInfo, ::Turing.Sampler{HMCDA{Any}}) at utils.jl:297
#invokelatest#1 at essentials.jl:697 [inlined]
invokelatest at essentials.jl:696 [inlined]
runmodel!(::Function, ::Turing.VarReplay.VarInfo, ::Turing.Sampler{HMCDA{Any}}) at hmc_core.jl:65
(::getfield(Turing, Symbol("#f#115")){Turing.VarReplay.VarInfo,typeof(myModel_model),Turing.Sampler{HMCDA{Any}}})(::TrackedArray{…,Array{Float64,1}}) at ad.jl:92
#8 at back.jl:139 [inlined]
forward at back.jl:126 [inlined]
forward(::Function, ::Array{Real,1}) at back.jl:139
gradient_reverse(::Array{Real,1}, ::Turing.VarReplay.VarInfo, ::Function, ::Turing.Sampler{HMCDA{Any}}) at ad.jl:96
gradient at ad.jl:23 [inlined]
gradient at ad.jl:19 [inlined]
#67 at hmc_core.jl:10 [inlined]
#_leapfrog#75(::getfield(Turing, Symbol("##71#72")){Turing.VarReplay.VarInfo,Turing.Sampler{HMCDA{Any}}}, ::getfield(Turing, Symbol("##73#74")){Turing.Sampler{HMCDA{Any}}}, ::Function, ::Array{Real,1}, ::Array{Float64,1}, ::Int64, ::Float64, ::getfield(Turing, Symbol("##67#68")){Turing.VarReplay.VarInfo,Turing.Sampler{HMCDA{Any}},typeof(myModel_model)}) at hmc_core.jl:93
(::getfield(Turing, Symbol("#kw##_leapfrog")))(::NamedTuple{(:rev_func, :log_func),Tuple{getfield(Turing, Symbol("##71#72")){Turing.VarReplay.VarInfo,Turing.Sampler{HMCDA{Any}}},getfield(Turing, Symbol("##73#74")){Turing.Sampler{HMCDA{Any}}}}}, ::typeof(Turing._leapfrog), ::Array{Real,1}, ::Array{Float64,1}, ::Int64, ::Float64, ::Function) at none:0
macro expansion at logging.jl:312 [inlined]
#_hmc_step#83(::Function, ::Function, ::Function, ::Array{Real,1}, ::Float64, ::getfield(Turing, Symbol("##69#70")){Turing.VarReplay.VarInfo,Turing.Sampler{HMCDA{Any}},typeof(myModel_model)}, ::Function, ::Float64, ::Float64, ::Array{Float64,1}) at hmcda.jl:150
(::getfield(Turing, Symbol("#kw##_hmc_step")))(::NamedTuple{(:rev_func, :log_func),Tuple{getfield(Turing, Symbol("##71#72")){Turing.VarReplay.VarInfo,Turing.Sampler{HMCDA{Any}}},getfield(Turing, Symbol("##73#74")){Turing.Sampler{HMCDA{Any}}}}}, ::typeof(Turing._hmc_step), ::Array{Real,1}, ::Float64, ::Function, ::Function, ::Float64, ::Float64, ::Array{Float64,1}) at none:0
macro expansion at logging.jl:319 [inlined]
step(::Function, ::Turing.Sampler{HMCDA{Any}}, ::Turing.VarReplay.VarInfo, ::Bool) at hmcda.jl:82
macro expansion at logging.jl:309 [inlined]
#sample#94(::Int64, ::Bool, ::Nothing, ::Int64, ::Nothing, ::Function, ::Function, ::HMC{Any}) at hmc.jl:148
sample(::Function, ::HMC{Any}) at hmc.jl:104
top-level scope at none:0
include_string(::Module, ::String, ::String) at loading.jl:1005
include_string(::Module, ::String, ::String, ::Int64) at eval.jl:30
(::getfield(Atom, Symbol("##117#123")){String,Int64,String})() at eval.jl:94
withpath(::getfield(Atom, Symbol("##117#123")){String,Int64,String}, ::String) at utils.jl:30
withpath at eval.jl:46 [inlined]
#116 at eval.jl:93 [inlined]
with_logstate(::getfield(Atom, Symbol("##116#122")){String,Int64,String}, ::Base.CoreLogging.LogState) at logging.jl:397
with_logger at logging.jl:493 [inlined]
#115 at eval.jl:92 [inlined]
hideprompt(::getfield(Atom, Symbol("##115#121")){String,Int64,String}) at repl.jl:85
macro expansion at eval.jl:91 [inlined]
(::getfield(Atom, Symbol("##114#120")){Dict{String,Any}})() at task.jl:85

Wrapping the data in an array, but that caused other problems.


#2

This does not directly answer your question, but here is a version using DynamicHMC.jl & friends:

using DynamicHMC, Distributions, LogDensityProblems, TransformVariables

struct MyModel{Tα, Tp, Td}
    α::Tα
    νprior::Tp
    data::Td
end

function (model::MyModel)(νs)   # log posterior
    loglikelihood = sum(logpdf(Normal(ν, α), x) for (α, ν) in zip(model.α, νs), x in model.data)
    logprior = sum(logpdf(model.νprior, ν) for ν in νs)
    loglikelihood + logprior
end

p = MyModel(fill(0.85, 3), Normal(15, 5), rand(Normal(0,1), 50)) # prior and data
p([1.0, 2.0, 3.0])                                               # test for random νs
P = TransformedLogDensity(as(Array, 3), p)                       # identity transformation
∇P = ForwardDiffLogDensity(P)                                    # AD using ForwardDiff

chain, nuts = NUTS_init_tune_mcmc(∇P, 1000)

νs_posterior = get_position.(chain)

if you want to code your likelihood directly.


#3

And you get very good mixing:

julia> using MCMCDiagnostics

julia> mapslices(effective_sample_size, get_position_matrix(chain); dims = 1)
1×3 Array{Float64,2}:
 909.708  933.698  858.296

#4

Thank you for taking the time to put together that code. It’s good to know that alternative approaches are available. I was wondering if you would be able to answer a few questions? Is pmap the preferred method for running multiple chains in parallel? For example,

rngs = [Random.seed!(i) for i in 1:Nchains]
output = pmap(x->NUTS_init_tune_mcmc(x,∇P, 1000),rngs)

When I pass an rng, it prints out a lot of data. So I’m guessing that I am doing it incorrectly.

Secondly, is it possible to output the chains into an MCMCChain.jl object for the purpose of plotting and diagnostics? This would be handy. Thanks!


#5

NUTS_init_tune_mcmc returns the chain and the tuned sampler (so you could continue sampling). Very few people use that so I may remove it as the default, or introduce an alternative API. Currently you can just keep the first value, eg (continuing the above example)

using Random, Distributed
Nchains = 4
rngs = [Random.seed!(i) for i in 1:Nchains];
output = pmap(x -> first(NUTS_init_tune_mcmc(x, ∇P, 1000)), rngs)

works fine (diagnostic output is garbled though, as it is not ready for parallel chains yet).

Conversion to MCMCChain.jl is not supported by my packages (and I don’t know about others). I kind of prefer the bare-bones, modular approach. In practice, IMO the two most useful diagnostics are \hat{R} and ESS, both of which are supported by MCMCDiagnostics.

If you get errors, please make a self-contained MWE and I will look at it.


#6

Thanks for your help. The printing was a false alarm. I added a print statement to see what was happening in the loglikelihood function. I forgot to resubmit the code after I removed the print statement. So it looked like it was printing for another reason. Sorry for the noise.


#7

For anyone who is following the code for parallel chains with DynamicHMC, Random.seed!() is not the correct method. It gives the same result for each chain. rngs = [MersenneTwister() for i in 1:Nchains] seems to be working properly.


#8

Good point. Opened a docs issue to address this.


#9

I adapted your example to my code and encountered a type problem that I am not sure how to resolve. I did not realize that the sampler uses a forward diff object. The problem I am having is that (1) I reparameterize my model so that resulting parameters are a function of the parameters I input, and (2), I pass these parameters to other likelihood functions because my model is a mixture of convolutions. Aside from the type issue, I’m not sure if this would evaluate the forward difference at each level of the model, rather than just at the creation of mydistribution in the toy example below.

In the example below, the commented function i initially created crashes. I thought maybe if I wrap the log posterior calculations in a different function and extract the values, it might evaluate properly (presumably at two points), but evidently that does not work. Do you have any ideas?

using Distributions
import Distributions: pdf, logpdf
using DynamicHMC, Distributions, LogDensityProblems, TransformVariables,MCMCChain,Random

mutable struct mydistribution{T} <: ContinuousUnivariateDistribution
    ν::T
    α::T
    μ::T    
    function mydistribution(ν,α)
        n = new{typeof(ν)}()
        n.ν = ν
        n.α = α
        n.μ = α./ν #some kind of transformation
        return n
    end
end

function pdf(d::mydistribution,T::AbstractArray)
    #various intensive computations
    likelihoods = fill(0.0,length(T))
    for (i,t) in enumerate(T)
        likelihoods[i] = sum(@. .5*pdf(Normal.(d.μ,d.α),t))
    end
    return likelihoods
end

logpdf(d::mydistribution,T::AbstractArray) = sum(log.(pdf(d,T)))

struct MyModel{Tα, Tp, Td}
    α::Tα
    νprior::Tp
    data::Td
end

# function (model::MyModel)(νs)   # log posterior
#     #This crashes
#     #dist = mydistribution([νs;0.0],model.α)
#     loglikelihood = logpdf(dist,model.data)
#     logprior = sum(logpdf(model.νprior, ν) for ν in νs)
#     loglikelihood + logprior
# end

function (model::MyModel)(νs)   # log posterior
    posterior(model,νs)
end

function posterior(model,νs)
    #This runs but does not update
    #println(fieldnames(typeof(νs[1])))
    dist = mydistribution([νs[1].value;νs[2].value;0.0],model.α)
    loglikelihood = logpdf(dist,model.data)
    logprior = sum(logpdf(model.νprior, ν.value) for ν in νs)
    return loglikelihood + logprior
end
p = MyModel(fill(0.85, 3), Normal(15, 5), rand(Normal(0,.85), 50)) # prior and data

P = TransformedLogDensity(as(Array, 2), p)                       # identity transformation
∇P = ForwardDiffLogDensity(P)                                    # AD using ForwardDiff

chain, nuts = NUTS_init_tune_mcmc(∇P, 1000)
νs_posterior = get_position.(chain)
mean(hcat(νs_posterior...),dims=2)```

#10

I don’t think it does; you are just getting an error message. I think it is informative.

AD should not be a concern, as long as you get your model (log) density working. You can test that by calling p with parameters.


#11

Sorry if I am being dense here. Maybe you are referring to the extra # in the commented function? In the process of debugging, I accidently left an an extra # on one line of commented the function, but that is not the source of the problem. Sorry about that if that is the point of confusion. When I run the first function, I get:

function (model::MyModel)(νs)   # log posterior
    #This crashes
    dist = mydistribution([νs;0.0],model.α)
    loglikelihood = logpdf(dist,model.data)
    logprior = sum(logpdf(model.νprior, ν) for ν in νs)
    loglikelihood + logprior
end

p([1,1])
-104.30759593477619

So the test passes as that point. However, when I try to sample, I get the following error:

chain, nuts = NUTS_init_tune_mcmc(∇P, 1000)

MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##1#2")){TransformedLogDensity{TransformVariables.ArrayTransform{TransformVariables.Identity,1},MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}}},Float64},Float64,2})
Closest candidates are:
  Float64(::Real, !Matched::RoundingMode) where T<:AbstractFloat at rounding.jl:185
  Float64(::T<:Number) where T<:Number at boot.jl:725
  Float64(!Matched::Int8) at float.jl:60
  ...
convert(::Type{Float64}, ::ForwardDiff.Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##1#2")){TransformedLogDensity{TransformVariables.ArrayTransform{TransformVariables.Identity,1},MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}}},Float64},Float64,2}) at number.jl:7
setindex!(::Array{Float64,1}, ::ForwardDiff.Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##1#2")){TransformedLogDensity{TransformVariables.ArrayTransform{TransformVariables.Identity,1},MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}}},Float64},Float64,2}, ::Int64) at array.jl:769
pdf(::mydistribution{Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##1#2")){TransformedLogDensity{TransformVariables.ArrayTransform{TransformVariables.Identity,1},MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}}},Float64},Float64,2},1}}, ::Array{Float64,1}) at untitled-19410a83376b53dfea67e184bad6d413:22
logpdf(::mydistribution{Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##1#2")){TransformedLogDensity{TransformVariables.ArrayTransform{TransformVariables.Identity,1},MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}}},Float64},Float64,2},1}}, ::Array{Float64,1}) at untitled-19410a83376b53dfea67e184bad6d413:27
(::MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}})(::Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##1#2")){TransformedLogDensity{TransformVariables.ArrayTransform{TransformVariables.Identity,1},MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}}},Float64},Float64,2},1}) at untitled-19410a83376b53dfea67e184bad6d413:38
transform_logdensity(::TransformVariables.ArrayTransform{TransformVariables.Identity,1}, ::MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##1#2")){TransformedLogDensity{TransformVariables.ArrayTransform{TransformVariables.Identity,1},MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}}},Float64},Float64,2},1}) at TransformVariables.jl:144
macro expansion at Parameters.jl:732 [inlined]
logdensity at LogDensityProblems.jl:138 [inlined]
#1 at AD_ForwardDiff.jl:18 [inlined]
vector_mode_dual_eval(::getfield(LogDensityProblems, Symbol("##1#2")){TransformedLogDensity{TransformVariables.ArrayTransform{TransformVariables.Identity,1},MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}}}, ::Array{Float64,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##1#2")){TransformedLogDensity{TransformVariables.ArrayTransform{TransformVariables.Identity,1},MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}}},Float64},Float64,2,Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##1#2")){TransformedLogDensity{TransformVariables.ArrayTransform{TransformVariables.Identity,1},MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}}},Float64},Float64,2},1}}) at apiutils.jl:35
vector_mode_gradient! at gradient.jl:102 [inlined]
gradient! at gradient.jl:35 [inlined]
gradient! at gradient.jl:33 [inlined]
macro expansion at Parameters.jl:733 [inlined]
logdensity(::Type{LogDensityProblems.ValueGradient}, ::ForwardDiffLogDensity{TransformedLogDensity{TransformVariables.ArrayTransform{TransformVariables.Identity,1},MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}},ForwardDiff.GradientConfig{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##1#2")){TransformedLogDensity{TransformVariables.ArrayTransform{TransformVariables.Identity,1},MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}}},Float64},Float64,2,Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##1#2")){TransformedLogDensity{TransformVariables.ArrayTransform{TransformVariables.Identity,1},MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}}},Float64},Float64,2},1}}}, ::Array{Float64,1}) at AD_ForwardDiff.jl:47
phasepoint_in(::DynamicHMC.Hamiltonian{ForwardDiffLogDensity{TransformedLogDensity{TransformVariables.ArrayTransform{TransformVariables.Identity,1},MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}},ForwardDiff.GradientConfig{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##1#2")){TransformedLogDensity{TransformVariables.ArrayTransform{TransformVariables.Identity,1},MyModel{Array{Float64,1},Normal{Float64},Array{Float64,1}}}},Float64},Float64,2,Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Sym...

When I changeν::T to ν::AbstractArray or ν::Any in mydistribution, I still get the problem.


#12

This happens because you are allocating

which is of Float64s, and trying to save values in there, which the dual types of AD don’t convert to (because that would lose the derivative information). Instead, you should use generators, broadcasting, or map, like the first example I wrote.

Also, incidentally, don’t use pdf, always use logpdf, to avoid under/overflow.


#13

Thanks for your patience. I usually do not delve into the internal workings of packages. I did not realize that logpdf was accepting and returning a Dual object.

This is good advice and I usually try to follow it. Unfortunately, there is no analytic solution to the loglikelihood function. I have to approximate the pdf numerically with fourier transforms and other operations, then take the log of the pdf. So in certain cases there might be numeric problems.

One problem in my particular use case is that broadcasting data over logpdf is prohibitively slow because of the computations I need to perform before finally obtaining the loglikelihood. On each call to pdf, I use fourier transforms along with other calculations to obtain densities for several values given a fixed set of parameters. From that point, I can use interpolation to get a very accurate approximation of a the density at any point in the domain. So if I do this on each call to logpdf via map or some other broadcasting method, performance suffers greatly.

Is the solution to redefine logpdf method so that it adds the log densities to the Dual object?


#14

No worries, I am happy to help, I also do Bayesian inference, and this part of Julia is tricky.

Julia is powerful because it allows very generic programming, eg in the above case (ostensibly) seamlessly replacing Float64 <: Real with another <: Real type (eg ForwardDiff.Dual). Ideally, you would not need to care.

This requires a different approach if you are used to filling preallocated containers for output. Your best solution is to rearrange your algorithm so that it implicitly figures out the result types. I would recommend thinking hard about this; it is fine if you need an intermediate result for calculations, just generate that and use it.

In case that fails, there are various solutions:

  1. Use the type of the first element, and then widen if necessary. This is very robust, but tricky to program and is frankly not the best introduction to the language. Base.collect and friends do this.

  2. Use promotion to figure out. See the use of promote_ functions in the SparseArrays and LinearAlgebra standard libraries.

  3. Use something quick and dirty. Eg

function use_preallocated(A::AbstractArray, B::AbstractArray)
    T = typeof(one(eltype(A))/one(eltype(B)))
    preallocated = Vector{T}(undef, len)
    ...
end

#15

Thanks for the advice. I’ll give this some thought over the weekend and hopefully get closer to a solution next week. By chance, do you know of any examples or package code that might also be illuminating? I looked through the repositories for DynamicHMC and LogDensityProblems, but didn’t spot anything.


#16

I started

but it is very basic at the moment. If you open an issue and describe the problem in detail, I am very happy to help more.