How to Use a Distribution Estimated via Kernel Density Estimation (KDE) as a Prior in Turing.jl

hinata235 · January 7, 2025, 4:20pm

Background and Goal

I have a set of discrete samples (e.g., from some existing dataset or posterior). I estimated a continuous distribution from these samples using a kernel density estimation (KDE) approach (such as via KernelDensity.jl). Now, I would like to use that KDE-based distribution as a prior for some parameters in a Bayesian model implemented in Turing.jl.

I have examined Turing’s official documentation for advanced usage, but it is not entirely clear to me how to incorporate a custom KDE-based distribution as a prior, especially regarding how Bijectors.jl might require certain transformations to integrate with Turing.jl’s sampling algorithms (like NUTS).

Question

• How can I take an empirical or discrete dataset, estimate a distribution via KDE, and then directly use that KDE-based distribution as a prior in Turing.jl?

• What concrete implementation steps (i.e., definitions of pdf, logpdf, rand, and possibly a custom bijector) are needed to ensure Turing.jl’s MCMC (particularly NUTS) can handle this custom KDE-based prior without errors?

Any practical code examples or best-practice guidance would be greatly appreciated.

marcobonici · January 7, 2025, 7:17pm

Is that posterior a complex one? If you can fit the distribution using Bijectors.jl, the trained object can be used as a distribution, e.g. you can write something like

@model jointloglikelihood(data, ...)
    params ~ fitted_bijectors
    #other calculations

sethaxen · January 8, 2025, 9:55am

The simplest solution might be something like that proposed in Sampling from a KDE object - #4 by sethaxen. A KDE is just a mixture model, so if you keep a copy of the data and the bandwidth, you can convert it to a Distributions.MixtureModel.

This may not be ideal for you, since logpdf evaluation will scale with the size of the data that the KDE was fit to. You could have an alternative implementation that uses an interpolation scheme between the points the KDE is evaluated at. That might look something more like this.

using KernelDensity, Distributions, Interpolations, Random, Turing

struct InterpKDEDistribution{T<:Real,K<:KernelDensity.InterpKDE} <: ContinuousUnivariateDistribution
    kde::K
end
function InterpKDEDistribution(k::KernelDensity.InterpKDE)
    T = eltype(k.kde.x)
    return InterpKDEDistribution{T,typeof(k)}(k)
end
function InterpKDEDistribution(k::KernelDensity.UnivariateKDE)
    return InterpKDEDistribution(KernelDensity.InterpKDE(k))
end

function Distributions.minimum(d::InterpKDEDistribution)
    return first(only(Interpolations.bounds(d.kde.itp.itp)))
end

function Distributions.maximum(d::InterpKDEDistribution)
    return last(only(Interpolations.bounds(d.kde.itp.itp)))
end

function Distributions.pdf(d::InterpKDEDistribution, x::Real)
    return pdf(d.kde, x)
end

function Distributions.logpdf(d::InterpKDEDistribution, x::Real)
    return log(pdf(d, x))
end

# need to at least have a very rough implementation of this
# much better/more efficient implementation is possible,
# but this will only be called once when sampling starts 
function Random.rand(rng::Random.AbstractRNG, d::InterpKDEDistribution)
    (; kde) = d
    knots = Interpolations.knots(kde.itp.itp)
    cdf = cumsum(pdf.(Ref(kde), knots))
    u = rand(rng)
    return knots[findlast(u .> cdf)]
end

x = rand(Poisson(100), 100)
k = KernelDensity.kde(x)
d = InterpKDEDistribution(k)

@model function foo()
    x ~ d
end

model = foo()
chain = sample(model, NUTS(), 1_000)

Note that no custom Bijector is necessary, since the constraints on a univariate distribution are lower-bounded, upper-bounded, or both, and custom bijectors are already defined for these cases. i.e. defining minimum and maximum is sufficient here.

hinata235 · January 9, 2025, 12:11pm

@sethaxen @marcobonici
Thank you so much!
@sethaxen I solved it your way.

hinata235 · January 12, 2025, 2:02pm

@sethaxen

This is a new question. I am working with multivariate parameters and would like to extend this code to perform multivariate kernel density estimation in order to preserve the correlations between the parameters. If you have any ideas, I would appreciate your input.

Topic		Replies	Views
Using Multivariate Posterior Samples as Prior in Julia’s Turing for MCMC General Usage question , turing , kernel	0	39	January 12, 2025
Using a posterior from a previous sample as a prior Probabilistic programming turing	2	812	April 25, 2021
How to write a custom distribution using pdf from a Kernel Density Estimator? Performance statistics , turing , distributions	4	538	September 14, 2021
Estimating a Multivariate Distribution General Usage question , package , distributions	6	765	May 29, 2020
Get conditional kernel densities in julia Probabilistic programming question , distributions	0	593	January 27, 2021

How to Use a Distribution Estimated via Kernel Density Estimation (KDE) as a Prior in Turing.jl

Related topics