How to create sum-log-likelihood of custom distribution for use in Turing.jl

I have a seemingly simple question about working with Turing.jl and Distributions.jl I’m hoping someone can help me with. For my work, I need to be able to define custom continuous, uni-variate distributions along with the associated sum-log-likelihood methods (since that is what optimizers end up using). I’ve done this in the obvious way using various tutorials: define a distribution along with the various functions including logpdf. Here is where I get to the point where I’m not sure how to best proceed.

I would like to be able to tell Turing what sum-log-likelihood (SLL) function to use, not just logpdf. My reason for this is as follows. I tend to have 100k - 1M IID data points on a continuous 1D domain. As such, it is much more efficient for me to compute the (log)pdf at, let’s say, 1_000 points, construct a 1D interpolant and then evaluate my 100k-1M log-likelihoods using that interpolating function. I want to avoid anything like sum(logpdf.(dist,x)) since my x is large.

I can define a univariate distribution (dist) and use it to construct a function --SLL(dist , x::Vector{Real})-- to do this process. However, I don’t know how to tell Turing to use that SLL instead of a logpdf iterated over every data point. It seems to me this is probably a matter of attaching this to the appropriate method (logpdf, _logpdf, loglikelihood), but I don’t know what Turing looks at in its internals.

The best I have been able to come up with is to construct a multi-variate distribution which is just N copies of my univeriate, and then attach my SLL to the logpdf of the MV distribution. But this is not ideal.

Any thoughts?

I think what you need is to define a multivariate distribution and then a custom logpdf for the whole vector.

This might be relevant:

Thanks for your response @Dan . This was my first thought. However it means that the “pdf” and “rand” methods for that distribution don’t make much sense. If I have 10_000 IID data points for example, the multi-variate distribution would operate on a 10_000 dimensional space. But it is still inherently a 1D distribution with a 1D pdf for which the rand method should produce scalers.

Can you perform all of the operations in your logpdf function?

function logpdf(dist::YourType, data::Vector{<:Real})
    # 1. compute log likelihood on subset of data 
    # 2. interpolate log likelihood at remaining points 
    # 3. sum and return the exact and interpolated log likelihoods
end

Yes, I can do this. But this would require that my distribution of type ::YourType be a multi-variate distribution. This is ok if your only concerned with batch evaluation of the logpdf of an entire sample of IID data. However, the underlying distribution is still univariate. So I seem to be stuck with either

  1. Construct a univariate distribution and have scaler valued pdf, logpdf, rand. In this case, I don’t know how to batch evaluation of sum-log-likelihood without using logpdf.(dist,data), which I am specifically trying to avoid.

  2. Construct a multivariate distribution. Now I have a way to use logpdf to batch evaluate the sum-log-likelihood. However, now my pdf and rand methods don’t make sense since they should be scaler valued.

I’m really trying to find a way to define a custom, univariate distribution with standard pdf, logpdf, rand, etc… But then be able to define a separate batch evaluation function to find the sum-log-likelihood of an IID data set.

Got it. It might be worth inquiring on the Turing slack channel. Sometimes the Turing devs are more responsive there and they are likely to have a good solution if one exists.

The nice thing about Julia is that it gives you several options writing code that makes sense logically, e.g., logpdf.(dist, data), but is still fast. Here, you could either try to add some clever overloads for sum or broadcasted that are faster than the default sum(logpdf.(dist, data)) or even simpler use the sum method taking a function as first argument:

dist = TDist(3.7)
data = randn(250_000);
sum(logpdf.(dist, data))  # Does allocate
sum(Base.Fix1(logpdf, dist), data)  # Does not allocate

Unfortunately I need to avoid any kind of recursive evaluation of my distributions logpdf over data points. So no logpdf.(dist,data), no broadcase, or mapreduce.

My specific use case is that I use monte carlo simulation + kernel density estimation to numerically approximate distributions that have no analytic form. As such, I first have to construct a density function on an entire domain, then interpolate it to a point. The former is expensive and the latter is cheap. Evaluating my density at 1 point is thus essentially the same cost as evaluating it at 1_000 points since the cost comes in the construction step. So it is an absolute necessity I construct my density (pdfs and logpdfs) once, and then batch evaluate with that stored density.

If I were to build a logpdf function that constructs the density and evaluates at a scaler point, then logpdf.(dist,data) would perform 1_000 density constructions and 1_000 evaluations, which is highly redundant. I want to construct the density once, and evaluate it 1_000 times.

This is easy to write a function for. My trouble is getting it into the Turing environment. Essentially all optimizers use a batch evaluation of a sum-log-likelihood function. I just don’t know what to overload in Turing for this. I can do it with AdvancedHMC since I can just build a LogDensityProblem. But then I lose the Turing-verse (as far as I can tell).

Must be missing something, but why would you need to reconstruct the density estimator at each logpdf evaluation? (How would that even work from a single data point?) Could you not cache and reuse your estimator?

pd_est = construct_density(some_data)
logpdf(pd_est, datum_1)
logpdf(pd_est, datum_2)
...

Following up (while still not fully understanding your need here), Turing does provide a way to add custom likelihood contributions via DynamicPPL.@addlogprob!.
Here is an example using your code from the other thread:

using Distributions
using LinearAlgebra

... # insert your code my_mixture and fast_SLL here

@model function custom(data)
    θ ~ MvNormal(I(5))
    μ₁, μ₂, logσ₁, logσ₂, logitw = θ
    md = my_mixture(μ₁, μ₂, exp(logσ₁), exp(logσ₂), 1.0/(1.0 + exp(-logitw)))
    DynamicPPL.@addlogprob!(fast_SLL(md, data))
end

data = vcat(rand(Normal(0, 1), 100), rand(Normal(2, 0.1), 200));
fit = sample(custom(data), NUTS(200, 0.65), 400)

Guess this might be what you’re after?

Just another remark on fast_SLL: It’s numerically much more stable to work with probabilities on a log-scale – also for mixture distributions. Have a look at LogExpFunctions.logsumexp.

2 Likes

Since this is not your first MCMC rodeo, @Bill_Holmes. The @addlogprob! is the same as the target += in Stan.