Kernel Density Estimate boundary problems

vh94 · June 4, 2024, 10:17pm

Hello everyone,

I am currently trying to calculate probability density curves using the KernelDensity.jl package, however I ran into some issues due to the nature of my data.

I am not a statistician so i will have to explain my problem in layman terms…

The data vector x, I want to calculate the kde of, consist of positive integers including zero, (x_i ∈ Z ≥ 0) which has to be reflected in the density plot.

I know that KernelDensity.kde uses the Normal distribution by default, thus there is no positive support.

x = 0:1:1e4
KDE = kde(x,kernel=Normal)
plot(KDE.x,KDE.density)

In this uniform x case I can simply add this border argument to the kde call to get rid of any x < 0.

KDE = kde(x, boundary=(0,1e4), kernel = Normal)

However my real distribution of x is tailed,

x = round.(rand(LogNormal(3,1),1000))
KDE = kde(x, boundary = extrema(x), kernel = Normal)

and then the boundary argument just wraps the plot over zero.

I tried to adapt the underlying kernel_dist method with a truncated version of the Normal distribution, but that threw an error.

kernel_dist(::Type{Truncated},w::Real) = truncated(Normal(0.0,w);lower = 0)
KDE = kde(x, kernel = Truncated)

>ERROR: MethodError: no method matching Truncated(::Float64, ::Float64)

So my questions are 1: what am I doing wrong in adding my own kernel distribution; and 2: is this even the correct approach to the probability density?

I hope this question is somewhat clear and Thank you all in advance!

juliohm · June 4, 2024, 10:28pm

Since your values are discrete, it doesnt make much sense to compute densities, right? Why not histograms?

Also, take a look into AverageShiftedHistograms.jl as an alternative to KDEs.

vh94 · June 4, 2024, 10:57pm

Yes, you are right, in the discrete case a histogram makes more sense, however, the values of x are discrete since they are obtained through a gillespie simulation, i want to get the density to be able to compare those distributions with the ones from the continuous analytical solution. I already took a look at AverageShiftedHistograms and it seems like a great alternative- however I ran into the same issue of negative events (x).

juliohm · June 5, 2024, 1:13am

If you want to compare continuous distributions, then check DensityRatioEstimation.jl it only requires samples.

Topic		Replies	Views
Changing the smoothness of density function Visualization statistics	28	2604	August 21, 2021
How to bin data properly for further study, by example to plot it estimated PDf and CDF? New to Julia	12	3856	July 21, 2019
Kernel density estimation status Statistics question , package	30	10006	May 7, 2023
For a sequence of n values, that produces an expected value curve, overlay the distribution associated with each value horizontally General Usage plotting	0	284	June 19, 2022
Unable to get a normalized density plot! General Usage statsplots	8	1651	May 18, 2022

Kernel Density Estimate boundary problems

Related topics