KernelDensity.jl vs scipy.stats' gaussian_kde

danielalfonsetti · December 16, 2020, 7:53am

I’m trying to convert some python code to julia code, but there appears to be a discrepancy between the python KDE and the julia KDE. Here is a reproducible example, similar to what is happening in my project:

Python:

from scipy.stats import gaussian_kde
Y = np.array([i**2 for i in range(1, 101)])
grid = np.linspace(np.min(Y), np.max(Y), 100)
fy = gaussian_kde(Y, bw_method="silverman")(grid)
print(fy[0:5])

Julia:

using KernelDensity
Y = [i^2 for i in 1:100]
grid = range(minimum(Y), maximum(Y), length=100)
fy = kde(Y, grid).density
print(fy[1:5])

The python outputs:
[0.0001189 0.00013682 0.00014173 0.00013541 0.0001229],
whereas the julia outputs:
[0.00014240114216611903, 0.00015549364199413094, 0.00015341091350600083, 0.0001400091067542858, 0.00012267660100258762].

Everything appears to be equivalent up until the “fy = …” line. Does anyone know what I am doing wrong here? The discrepancy is even larger in my actual project and is affecting reproducibility.

Here are the respective docs:
https://github.com/JuliaStats/KernelDensity.jl
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html

jules · December 16, 2020, 9:10am

Can you try choosing a specific bandwidth? They both seem to be using Silverman’s rule by default but maybe they’re implemented slightly differently? Or it could be about rescaling, is one a multiple of the other?

mcreel · December 16, 2020, 2:15pm

It appears to me that with Python, the y data is composed of 101 points, with the maximum being 101^2, while for Julia, it is 100 points, with the maximum being 100^2. That could account for the bandwidths being different when using Silverman’s rule, which could lead to different results. Try to see what bandwidths are actually used, and check if they’re significantly different.

Topic		Replies	Views
Epanechnikov kernel density General Usage	2	1133	May 3, 2019
Kernel density estimation status Statistics question , package	30	10089	May 7, 2023
Evaluating explicitly the kernel for a Gaussian/Normal distribution Statistics question	1	655	March 13, 2017
Replicating behavior of scipy.ndimate.gaussian_filter General Usage images , python	3	1218	December 31, 2019
Changing the smoothness of density function Visualization statistics	28	2675	August 21, 2021

KernelDensity.jl vs scipy.stats' gaussian_kde

Related topics