Sample from Kernel Density Estimator

From a sample of data I can create a KDE to estimate the probability density with KernelDensity.jl, e.g.:

density_estimator = kde(data)

Now, is there any package or custom code that I could use to retrieve a sample from a distribution with such a density, e.g. something that allows me to do rand(xxx(density_estimator))?

(Or maybe other implementations of KDE which have that feature?)

1 Like

I do not know of an implementation, but you can write your own with few lines. Assuming a 1-D normal kernel density estimator, to get one sample you need to:

  • pick a data point x, using for instance with something like x = sample(data)
  • get the kernel bandwith h, be it the Silverman optimal bandwith or extracted from the object created by the function kde_lscv(data) from the package you mention
  • generate a random sample with rand(Normal(x,1/h))

If you change the kernel beware of the parametrisation to make sure your bandwith is a proper scale parameter, and if you go multidimensional pay attention too.

But essentially you are sampling from a mixture model with equal weights, so first sample a component then sample from that component.

2 Likes

Thanks, that works, except that in the last step the standard deviation should be h, not 1/h:

using Random, KernelDensity, Distributions
data = randn(1000)
h = KernelDensity.default_bandwidth(data)
newdata = [rand(Normal(rand(data) , 1/h)) for _=1:1000]
std(newdata) # => 4.317997200903775

but:

newdata = [rand(Normal(rand(data) , h)) for _=1:1000]
std(newdata) # => 1.0014533279921076 (as expected)
2 Likes

You are quite right, that “beware of the parametrisation” here is pretty ironic given I used a wrong parametrisation one line above :sweat_smile:

2 Likes