Kernel Density Estimate boundary problems

Hello everyone,

I am currently trying to calculate probability density curves using the KernelDensity.jl package, however I ran into some issues due to the nature of my data.

I am not a statistician so i will have to explain my problem in layman terms…

The data vector x, I want to calculate the kde of, consist of positive integers including zero, (x_i ∈ Z ≥ 0) which has to be reflected in the density plot.

I know that KernelDensity.kde uses the Normal distribution by default, thus there is no positive support.

x = 0:1:1e4
KDE = kde(x,kernel=Normal)
plot(KDE.x,KDE.density)

In this uniform x case I can simply add this border argument to the kde call to get rid of any x < 0.

KDE = kde(x, boundary=(0,1e4), kernel = Normal)

However my real distribution of x is tailed,

x = round.(rand(LogNormal(3,1),1000))
KDE = kde(x, boundary = extrema(x), kernel = Normal)

and then the boundary argument just wraps the plot over zero.

I tried to adapt the underlying kernel_dist method with a truncated version of the Normal distribution, but that threw an error.

kernel_dist(::Type{Truncated},w::Real) = truncated(Normal(0.0,w);lower = 0)
KDE = kde(x, kernel = Truncated)

>ERROR: MethodError: no method matching Truncated(::Float64, ::Float64)

So my questions are 1: what am I doing wrong in adding my own kernel distribution; and 2: is this even the correct approach to the probability density?

I hope this question is somewhat clear and Thank you all in advance!

Since your values are discrete, it doesnt make much sense to compute densities, right? Why not histograms?

Also, take a look into AverageShiftedHistograms.jl as an alternative to KDEs.

Yes, you are right, in the discrete case a histogram makes more sense, however, the values of x are discrete since they are obtained through a gillespie simulation, i want to get the density to be able to compare those distributions with the ones from the continuous analytical solution. I already took a look at AverageShiftedHistograms and it seems like a great alternative- however I ran into the same issue of negative events (x).

If you want to compare continuous distributions, then check DensityRatioEstimation.jl it only requires samples.

Hi @vh94, I know this is an older thread, but I was recently working on this exact mathematical problem. Standard Gaussian KDEs really struggle with strict boundaries, and as you noticed, the default reflection methods can create unnatural artifacts near zero.

To solve this, I just released a package called BetaKDE.jl (now in the General Registry). Beta kernels naturally change their shape at the boundaries, meaning they never leak density outside the support and don’t require reflection.

Since you are bounding your data between 0 and the maximum value, you can just pass those limits directly to the estimator:

using BetaKDE

# Using your example data bounds
x_min, x_max = extrema(x)
KDE = betakde(x; lower=x_min, upper=x_max)

using Plots
plot(KDE)

I just wanted to say thank you for reading my mind : ) I actually came to this thread looking for an answer to exactly the same problem, but since I couldn’t find a solution, I ended up just using plain old histograms, which is fine for I needed it for (just visualisation of distributions with strictly positive support). Now that this tool exists, I can go back to revisit some of the analyses I originally had in mind.

That is fantastic. I hope it turns out to be useful. If you run into any problems witht he package, please let me know and I will try to fix it.