Kernel Density Estimate boundary problems

Hello everyone,

I am currently trying to calculate probability density curves using the KernelDensity.jl package, however I ran into some issues due to the nature of my data.

I am not a statistician so i will have to explain my problem in layman terms…

The data vector x, I want to calculate the kde of, consist of positive integers including zero, (x_i ∈ Z ≥ 0) which has to be reflected in the density plot.

I know that KernelDensity.kde uses the Normal distribution by default, thus there is no positive support.

x = 0:1:1e4
KDE = kde(x,kernel=Normal)
plot(KDE.x,KDE.density)

In this uniform x case I can simply add this border argument to the kde call to get rid of any x < 0.

KDE = kde(x, boundary=(0,1e4), kernel = Normal)

However my real distribution of x is tailed,

x = round.(rand(LogNormal(3,1),1000))
KDE = kde(x, boundary = extrema(x), kernel = Normal)

and then the boundary argument just wraps the plot over zero.

I tried to adapt the underlying kernel_dist method with a truncated version of the Normal distribution, but that threw an error.

kernel_dist(::Type{Truncated},w::Real) = truncated(Normal(0.0,w);lower = 0)
KDE = kde(x, kernel = Truncated)

>ERROR: MethodError: no method matching Truncated(::Float64, ::Float64)

So my questions are 1: what am I doing wrong in adding my own kernel distribution; and 2: is this even the correct approach to the probability density?

I hope this question is somewhat clear and Thank you all in advance!

Since your values are discrete, it doesnt make much sense to compute densities, right? Why not histograms?

Also, take a look into AverageShiftedHistograms.jl as an alternative to KDEs.

1 Like

Yes, you are right, in the discrete case a histogram makes more sense, however, the values of x are discrete since they are obtained through a gillespie simulation, i want to get the density to be able to compare those distributions with the ones from the continuous analytical solution. I already took a look at AverageShiftedHistograms and it seems like a great alternative- however I ran into the same issue of negative events (x).

If you want to compare continuous distributions, then check DensityRatioEstimation.jl it only requires samples.