Hi there,
I estimated the density for a variable of interest using the kde()
function from KernelDensity.jl
package. Is there not a built-in/efficient function that samples from this estimated density?
Thanks a lot!
Hi there,
I estimated the density for a variable of interest using the kde()
function from KernelDensity.jl
package. Is there not a built-in/efficient function that samples from this estimated density?
Thanks a lot!
This is really straightforward to do if one has a copy of the data, the bandwidth, and the kernel, since a KDE is just a uniformly weighted mixture whose components are the kernel centered on each data point, but the object returned by kde
stores none of these. See e.g. Make KDE a distribution and implement other Distributions method · Issue #58 · JuliaStats/KernelDensity.jl · GitHub
Thanks for the reply. I’ve been through the comments but I still do not see how this is really straightforward. Maybe its just that I am very unexperienced.
Here’s a minimal demo:
using Distributions: Distributions
using KernelDensity: KernelDensity
# Like KernelDensity.UnivariateKDE, but store also the data and (kernel) distribution we need to build a Distributions-like object.
struct UnivariateKDE{
D<:AbstractVector{<:Real},
R<:KernelDensity.UnivariateKDE,
K<:Distributions.UnivariateDistribution,
}
data::D
dist::K
kde::R
end
function kde(
data::AbstractVector{<:Real};
bandwidth=KernelDensity.default_bandwidth(data),
kernel=Normal,
kwargs...
)
dist = KernelDensity.kernel_dist(kernel, bandwidth)
k = KernelDensity.kde(data, dist; kwargs...)
return UnivariateKDE(data, dist, k)
end
# A KDE is just a uniform mixture of the kernel shifted to each data point
function Base.convert(::Type{<:Distributions.Distribution}, kde::UnivariateKDE)
components = map(kde.data) do x
return x + kde.dist
end
return Distributions.MixtureModel(components)
end
using Distributions, StatsPlots
dist = MixtureModel([Normal(0, 1), Normal(3, 3)])
data = rand(dist, 1_000)
k = kde(data)
kde_dist = convert(Distributions.Distribution, k)
p = plot(dist; components=false, label="true dist", lw=2)
plot!(k.kde; label="KDE.jl dist", lw=2)
plot!(kde_dist; components=false, label="KDE mixture", lw=1)
density!(rand(kde_dist, 100_000), lw=2, label="KDE mixture rand")
Thanks a lot!