How to write a custom distribution using pdf from a Kernel Density Estimator?

Let

x = rand(100); #some data

PDF of x can be estimated using KDE:

using KernelDensity
KDE_x = kde(x)
ik = InterpKDE(KDE_x)
KDE_pdf(x) = pdf(ik, x)

How can the KDE_pdf function be used in to construct a custom distribution?

Here’s my partial first try, still have to implement rand for this distribution:

I want to use the KDEDist in a Turing.jl model

using Distributions, KernelDensity

struct KDEDist <: ContinuousUnivariateDistribution
    data::Vector{Float64}
end

function Distributions.pdf(d::KDEDist, x::Real)
    KDE_fit = kde(d.data)
    ik = InterpKDE(KDE_fit)
    return pdf(ik, x)
end

As we can see that when the pdf function is called the KDE is fitted every time which is wasteful. Is there a way to just the pdf(ik, x) function into the Distributions.pdf function to avoid wasteful computation?

Do the KDE fit when constructing the struct.

1 Like

Also consider doing the same for multivariate distributions using https://github.com/noilreed/MultiKDE.jl. Also, this would be a valuable package :slight_smile:

1 Like

I don’t know how to KDE fit inside struct.

Instead passed the KDE_pdf into struct like this:

struct KDEDist <: ContinuousUnivariateDistribution
    KDE_pdf::Function
    h::Float64 #used for rand
end

function Distributions.pdf(d::KDEDist, x::Real)
    return KDE_pdf(x) 
end

dist = KDEDist(KDE_pdf, 1.0)
pdf(dist, 10)

Something like this

using Distributions, KernelDensity

struct KDEDist{D, K} <: ContinuousUnivariateDistribution
    data::D
    kde::K
end
function KDEDist(data)
    KDE_fit = kde(d.data)
    ik = InterpKDE(KDE_fit)
    return KDEDist(data, ik)
end

function Distributions.pdf(d::KDEDist, x::Real)
    return pdf(d.kde, x)
end
1 Like