How to approximate a distribution function from an arbitrary list?

I hope to obtain a approximated distribution function from list, counting exsiting times of each element. Is there any package can achieve this?
In mathematica, function SmoothKernelDistribution (in Wolfram document, it is done by linear interpolation) could do this. For example:

data = RandomVariate[NormalDistribution[], 10^3];
Table[Plot[f[\[ScriptCapitalD], x], {x, -4, 4},PlotLabel -> f], {f, {PDF, CDF}}]


I have tried kde and pdf in KernelDensity, but the result is not as good as mathematica. And I hope not only for normal distribution, but for any unknown type of distribution approximated by interpolation.

KernelDensity.jl should be using basically the exact same defaults as the defaults in that Mathematica function. I would be surprised if the results were noticeably any different for samples of that size (1000+ points).

1 Like

This is my code in Julia. I even generate more points.

using Distributions,KernelDensity,Plots
x = rand(Normal(), 100000)
f1 = kde(x);

I think your plot has too few points for your liking (you are only plotting on the 21 integers from -10 to 10).

E.g. it does not look that bad:

julia> using Distributions, KernelDensity, Plots

julia> f1 = kde(randn(10^3));

julia> x1 = range(-4,stop=4,length=1000)

julia> y1 = pdf.(Ref(f1),x1);

julia> plot(x1,y1)

Produces this:


A few ideas:

  1. Load the StatsPlots package and use the density function.
  2. Generate more points:
    plot(-10:.01:10, x -> pdf(f1, x))
  3. I don’t know what your goal is, but you might also consider fitting a parametric distribution to your data. Something like:
    d = fit(Normal, x)
1 Like

You are right. I hope to fit a distribution from data. In this simple example, it works for normal distribution. I need a general fit for any type of data.