KernelDensity.jl vs scipy.stats' gaussian_kde

I’m trying to convert some python code to julia code, but there appears to be a discrepancy between the python KDE and the julia KDE. Here is a reproducible example, similar to what is happening in my project:

Python:

from scipy.stats import gaussian_kde
Y = np.array([i**2 for i in range(1, 101)])
grid = np.linspace(np.min(Y), np.max(Y), 100)
fy = gaussian_kde(Y, bw_method="silverman")(grid)
print(fy[0:5])

Julia:

using KernelDensity
Y = [i^2 for i in 1:100]
grid = range(minimum(Y), maximum(Y), length=100)
fy = kde(Y, grid).density
print(fy[1:5])

The python outputs:
[0.0001189 0.00013682 0.00014173 0.00013541 0.0001229],
whereas the julia outputs:
[0.00014240114216611903, 0.00015549364199413094, 0.00015341091350600083, 0.0001400091067542858, 0.00012267660100258762].

Everything appears to be equivalent up until the “fy = …” line. Does anyone know what I am doing wrong here? The discrepancy is even larger in my actual project and is affecting reproducibility.

Here are the respective docs:
https://github.com/JuliaStats/KernelDensity.jl
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html

Can you try choosing a specific bandwidth? They both seem to be using Silverman’s rule by default but maybe they’re implemented slightly differently? Or it could be about rescaling, is one a multiple of the other?

It appears to me that with Python, the y data is composed of 101 points, with the maximum being 101^2, while for Julia, it is 100 points, with the maximum being 100^2. That could account for the bandwidths being different when using Silverman’s rule, which could lead to different results. Try to see what bandwidths are actually used, and check if they’re significantly different.

1 Like