Sampling from a KDE object

miguelborrero · October 22, 2024, 5:40pm

Hi there,

I estimated the density for a variable of interest using the kde() function from KernelDensity.jl package. Is there not a built-in/efficient function that samples from this estimated density?

Thanks a lot!

sethaxen · October 22, 2024, 6:20pm

This is really straightforward to do if one has a copy of the data, the bandwidth, and the kernel, since a KDE is just a uniformly weighted mixture whose components are the kernel centered on each data point, but the object returned by kde stores none of these. See e.g. Make KDE a distribution and implement other Distributions method · Issue #58 · JuliaStats/KernelDensity.jl · GitHub

miguelborrero · October 22, 2024, 6:31pm

Thanks for the reply. I’ve been through the comments but I still do not see how this is really straightforward. Maybe its just that I am very unexperienced.

sethaxen · October 22, 2024, 7:04pm

Here’s a minimal demo:

using Distributions: Distributions
using KernelDensity: KernelDensity

# Like KernelDensity.UnivariateKDE, but store also the data and (kernel) distribution we need to build a Distributions-like object.
struct UnivariateKDE{
    D<:AbstractVector{<:Real},
    R<:KernelDensity.UnivariateKDE,
    K<:Distributions.UnivariateDistribution,
}
    data::D
    dist::K
    kde::R
end

function kde(
    data::AbstractVector{<:Real};
    bandwidth=KernelDensity.default_bandwidth(data),
    kernel=Normal,
    kwargs...
)
    dist = KernelDensity.kernel_dist(kernel, bandwidth)
    k = KernelDensity.kde(data, dist; kwargs...)
    return UnivariateKDE(data, dist, k)
end

# A KDE is just a uniform mixture of the kernel shifted to each data point
function Base.convert(::Type{<:Distributions.Distribution}, kde::UnivariateKDE)
    components = map(kde.data) do x
        return x + kde.dist
    end
    return Distributions.MixtureModel(components)
end

using Distributions, StatsPlots

dist = MixtureModel([Normal(0, 1), Normal(3, 3)])
data = rand(dist, 1_000)
k = kde(data)
kde_dist = convert(Distributions.Distribution, k)
p = plot(dist; components=false, label="true dist", lw=2)
plot!(k.kde; label="KDE.jl dist", lw=2)
plot!(kde_dist; components=false, label="KDE mixture", lw=1)
density!(rand(kde_dist, 100_000), lw=2, label="KDE mixture rand")

miguelborrero · October 23, 2024, 8:54pm

Thanks a lot!

Topic		Replies	Views
Sample from Kernel Density Estimator Statistics question	3	2005	November 25, 2020
How to write a custom distribution using pdf from a Kernel Density Estimator? Performance statistics , turing , distributions	4	540	September 14, 2021
Get conditional kernel densities in julia Probabilistic Programming question , distributions	0	593	January 27, 2021
Estimating a Multivariate Distribution General Usage question , package , distributions	6	766	May 29, 2020
How to Use a Distribution Estimated via Kernel Density Estimation (KDE) as a Prior in Turing.jl General Usage question , turing , distributions , kernel	4	158	January 12, 2025

Sampling from a KDE object

Related topics