Sampling from a KDE object

Hi there,

I estimated the density for a variable of interest using the kde() function from KernelDensity.jl package. Is there not a built-in/efficient function that samples from this estimated density?

Thanks a lot!

This is really straightforward to do if one has a copy of the data, the bandwidth, and the kernel, since a KDE is just a uniformly weighted mixture whose components are the kernel centered on each data point, but the object returned by kde stores none of these. See e.g. Make KDE a distribution and implement other Distributions method · Issue #58 · JuliaStats/KernelDensity.jl · GitHub

Thanks for the reply. I’ve been through the comments but I still do not see how this is really straightforward. Maybe its just that I am very unexperienced.

Here’s a minimal demo:

using Distributions: Distributions
using KernelDensity: KernelDensity

# Like KernelDensity.UnivariateKDE, but store also the data and (kernel) distribution we need to build a Distributions-like object.
struct UnivariateKDE{
    D<:AbstractVector{<:Real},
    R<:KernelDensity.UnivariateKDE,
    K<:Distributions.UnivariateDistribution,
}
    data::D
    dist::K
    kde::R
end

function kde(
    data::AbstractVector{<:Real};
    bandwidth=KernelDensity.default_bandwidth(data),
    kernel=Normal,
    kwargs...
)
    dist = KernelDensity.kernel_dist(kernel, bandwidth)
    k = KernelDensity.kde(data, dist; kwargs...)
    return UnivariateKDE(data, dist, k)
end

# A KDE is just a uniform mixture of the kernel shifted to each data point
function Base.convert(::Type{<:Distributions.Distribution}, kde::UnivariateKDE)
    components = map(kde.data) do x
        return x + kde.dist
    end
    return Distributions.MixtureModel(components)
end

using Distributions, StatsPlots

dist = MixtureModel([Normal(0, 1), Normal(3, 3)])
data = rand(dist, 1_000)
k = kde(data)
kde_dist = convert(Distributions.Distribution, k)
p = plot(dist; components=false, label="true dist", lw=2)
plot!(k.kde; label="KDE.jl dist", lw=2)
plot!(kde_dist; components=false, label="KDE mixture", lw=1)
density!(rand(kde_dist, 100_000), lw=2, label="KDE mixture rand")

3 Likes

Thanks a lot!