How to write a custom distribution using pdf from a Kernel Density Estimator?

Let

x = rand(100); #some data

PDF of x can be estimated using KDE:

using KernelDensity
KDE_x = kde(x)
ik = InterpKDE(KDE_x)
KDE_pdf(x) = pdf(ik, x)

How can the KDE_pdf function be used in to construct a custom distribution?

Here’s my partial first try, still have to implement rand for this distribution:

I want to use the KDEDist in a Turing.jl model

using Distributions, KernelDensity

struct KDEDist <: ContinuousUnivariateDistribution
    data::Vector{Float64}
end

function Distributions.pdf(d::KDEDist, x::Real)
    KDE_fit = kde(d.data)
    ik = InterpKDE(KDE_fit)
    return pdf(ik, x)
end

As we can see that when the pdf function is called the KDE is fitted every time which is wasteful. Is there a way to just the pdf(ik, x) function into the Distributions.pdf function to avoid wasteful computation?

Do the KDE fit when constructing the struct.

Also consider doing the same for multivariate distributions using https://github.com/noilreed/MultiKDE.jl. Also, this would be a valuable package :slight_smile:

I don’t know how to KDE fit inside struct.

Instead passed the KDE_pdf into struct like this:

struct KDEDist <: ContinuousUnivariateDistribution
    KDE_pdf::Function
    h::Float64 #used for rand
end

function Distributions.pdf(d::KDEDist, x::Real)
    return KDE_pdf(x) 
end

dist = KDEDist(KDE_pdf, 1.0)
pdf(dist, 10)

Something like this

using Distributions, KernelDensity

struct KDEDist{D, K} <: ContinuousUnivariateDistribution
    data::D
    kde::K
end
function KDEDist(data)
    KDE_fit = kde(d.data)
    ik = InterpKDE(KDE_fit)
    return KDEDist(data, ik)
end

function Distributions.pdf(d::KDEDist, x::Real)
    return pdf(d.kde, x)
end