Changing the smoothness of density function

Thanks for pointing this out. I don’t see any documentation for density anywhere. Are keyword arguments like bandwidth described anywhere?

Great illustration of this concept! The density plot fails at the boundaries.

↑This is not a problem. However, the following↓ is wrong.

If you had shown only the KDE in the following plot, the other would not be able to recognize that the true sample seems to be a uniformly distributed on the set {1,2,3,4,5,6}.

You can use a KDE plot if you want, but if you want to give a scientifically honest argument, you should also provide a plot that shows the true sample information as well, for examples, the histogram with integer bins and the plot of the sample ecdf.

If you set normalize = true (or norm = true, for short) in the histogram plot, you can compare it to the plot of the density function of a continuous distribution. There is no need to use KDE all the time.

1 Like

I’d just say that any prescription of what you “should” do needs to be specific to the particular application. There is no universal here. It would be trivial for me to construct a problem where the KDE is a better representation of reality than the discrete distribution over the data. Particularly when the underlying process is continuous and the measuring instrument is discretizing it artificially, which is a pretty common case.

1 Like

Nice example code and nice graph. Sorry I cannot resist this opportunity:

It is difficult to understand why statisticians commonly limit their inquires to Averages, and do not revel in more comprehensive views. Their souls seem as dull to the charm of variety as that of the native of our flat English counties, whose retrospect of Switzerland was that, if its mountains could be thrown into its lakes, two nuisances would be got rid at once.

Francis Galton, 1889.

2 Likes

Perhaps can do an envelope function which “envelope” all the peaks, then normalize it to 1 if needed.
Or collecting those peak points, do Lagrange 4-point interpolation.

Just for reference, here I have plotted my data set using:

  • histogram plot
  • density plot (with 3 different bandwidths).
  • and by interpolating my plot using Interpolations (interpolate(my_data, BSpline(Quadratic(Reflect(OnCell()))))) and plotting it continuously.
    tmp

Thanks for sharing your output plot.

A few comments:

  1. Your axis labels are swapped
  2. You might want to use the following bins: 0.5:11:5. That way the bars are centered over the integer value they represent (i.e., the center of the bar for 2 number of particles is vertically aligned with the number 2 on the x axis because it has the fraction of total that are between 1.5 and 2.5. This is just an aesthetics suggestion, though. You might have to do your interpolation OnGrid() though if you decide to make this change.
1 Like

Thanks. That second one is a really good point, never thought of that!

1 Like