# Using MixtureModels (from Distributions.jl) in Turing? Issues with posterior prediction

I am trying to fit a kernel mixture model in Turing. It seems to work for estimation, but gives an error when trying to use `predict`. I’d be curious if anyone has a suggestion for a fix.

``````using Distributions, Turing

# Unidimensional Kernel Mixture model with K pre-specified components
# that cover the space from min_x to max_x
@model function KMM(x, min_x, max_x, k, σ)

N = size(x, 1)
linspan = range(min_x, stop=max_x, length=k)
kernels = map(u -> Normal(u, σ), linspan)

ω ~ Dirichlet(k, 1.0)
mixdist = MixtureModel(kernels, ω)

x ~ filldist(mixdist, N)

end

# Simulate data from a bimodal distribution
data = vcat(rand(Normal(-1, 0.5), 50), rand(Normal(1, 0.5), 50))

# Define a kernel mixture with 10 gaussian components, with means covering -2:2
model = KMM(data, -2.0, 2.0, 10, 0.5)

# Estimate weights
m1 = sample(model, NUTS(0.65), 1000)
``````

That seems to work (although it is quite slow for a very small model). But when I want to get the posterior predictive distribution of the original data, it gives an error, saying that the method `loglikelihood` does not exist for the filldist of mixtures.

``````pp_data = predict(KMM(Vector{Union{Missing, Float64}}(missing, length(data)), -2.0, 2.0, 10, 0.5), m1)

MethodError: no method matching loglikelihood(::Product{Continuous, MixtureModel{Univariate, Continuous, Normal{Float64}, Categorical{Float64, Vector{Float64}}}, FillArrays.Fill{MixtureModel{Univariate, Continuous, Normal{Float64}, Categorical{Float64, Vector{Float64}}}, 1, Tuple{Base.OneTo{Int64}}}}, ::Vector{Union{Missing, Float64}})
``````

`MixtureModel` from `Distributions.jl` apparently doesn’t have a `loglikelihood` function, only a `logpdf` function, which does the job of both. I thought if I added a `loglikelihood` function for the mixture, it might fix it:

``````loglikelihood(d::Union{UnivariateMixture, MultivariateMixture}, x) = logpdf(d, x)
``````

But it doesn’t change the error.

Any ideas? Alternatives? I seem to remember that the MixtureModel distribution is a bit of an odd duck that often doesn’t play well with packages like Turing, but if I can avoid writing a custom logpdf function for mixtures, it’d be nice…

Also, the fact that the lack of a `loglikelihood` function for a `MixtureModel` is causing an issue for posterior prediction makes me suspicious of whether the original model estimates are actually correct. (Although I haven’t seen anything that looks obviously wrong.)

`loglikelihood` is already defined for `MixtureModel` - it just uses the default `loglikelihood` implementations for `UnivariateDistribution`s and `MultivariateDistribution`s which sums `logpdf` values. The main problem here is that the second argument is of type `Vector{Union{Missing, Float64}}`: Distributions (and also Turing) only supports evaluation of `loglikelihood` with `Real` (for univariate distributions), `AbstractArray{<:Real}`, or e.g. `AbstractArray{<:AbstractVector{<:Real}}` (for multivariate distributions).

More concretely, the problem is that you pass a `Vector{Union{Missing, Float64}}` for `x` in the `predict` call. Generally, Turing samples variables on the LHS of a `~` statement that are not an argument of the model or `missing`. In the expression `x ~ ...` this means `x` would be sampled if `x === missing` (since it is an argument of the model). I.e., you should pass `missing` for `x` in the `predict` call but not a vector of `missing`. Of course, this means that you would have to define `N` independent of `x`. You could e.g. define your model as

``````@model function KMM(x, min_x, max_x, k, σ, N=size(x, 1))
linspan = range(min_x, stop=max_x, length=k)
kernels = map(u -> Normal(u, σ), linspan)

ω ~ Dirichlet(k, 1.0)
mixdist = MixtureModel(kernels, ω)

x ~ filldist(mixdist, N)
end
``````

Then you should be able to perform inference without any changes and could call `predict` e.g. as

``````pp_data = predict(KMM(missing, -2.0, 2.0, 10, 0.5, size(data, 1)), m1)
``````
2 Likes

Thanks! That worked perfectly.

And yeah, the idea that there was no `loglikelihood` function defined didn’t sound right (otherwise how would the inference work?) But the error – combined with the fact that the docs for the MixtureModel type mention a `logpdf` function but not `loglikelihood` – had me wondering.