How to approximate a distribution function from an arbitrary list?

somranger · November 18, 2022, 7:08pm

I hope to obtain a approximated distribution function from list, counting exsiting times of each element. Is there any package can achieve this?
In mathematica, function SmoothKernelDistribution (in Wolfram document, it is done by linear interpolation) could do this. For example:

data = RandomVariate[NormalDistribution[], 10^3];
Table[Plot[f[\[ScriptCapitalD], x], {x, -4, 4},PlotLabel -> f], {f, {PDF, CDF}}]

mmexport1668796265060

I have tried kde and pdf in KernelDensity, but the result is not as good as mathematica. And I hope not only for normal distribution, but for any unknown type of distribution approximated by interpolation.

tbeason · November 18, 2022, 7:12pm

KernelDensity.jl should be using basically the exact same defaults as the defaults in that Mathematica function. I would be surprised if the results were noticeably any different for samples of that size (1000+ points).

somranger · November 18, 2022, 7:19pm

This is my code in Julia. I even generate more points.

using Distributions,KernelDensity,Plots
x = rand(Normal(), 100000)
f1 = kde(x);
plot(-10:10,x->pdf(f1,x))

aramirezreyes · November 18, 2022, 7:27pm

I think your plot has too few points for your liking (you are only plotting on the 21 integers from -10 to 10).

E.g. it does not look that bad:

julia> using Distributions, KernelDensity, Plots

julia> f1 = kde(randn(10^3));

julia> x1 = range(-4,stop=4,length=1000)

julia> y1 = pdf.(Ref(f1),x1);

julia> plot(x1,y1)

Produces this:

mthelm85 · November 18, 2022, 7:31pm

A few ideas:

Load the StatsPlots package and use the density function.
Generate more points:
```
plot(-10:.01:10, x -> pdf(f1, x))
```
I don’t know what your goal is, but you might also consider fitting a parametric distribution to your data. Something like:
```
d = fit(Normal, x)
plot(d)
```

somranger · November 18, 2022, 7:35pm

You are right. I hope to fit a distribution from data. In this simple example, it works for normal distribution. I need a general fit for any type of data.

Topic		Replies	Views
Kernel density estimation status Statistics question , package	30	9978	May 7, 2023
Sampling from a KDE object Statistics kernel	4	124	October 23, 2024
Plotting a line histogram New to Julia question , plotting	9	8108	January 9, 2021
Get conditional kernel densities in julia Probabilistic programming question , distributions	0	591	January 27, 2021
For a sequence of n values, that produces an expected value curve, overlay the distribution associated with each value horizontally General Usage plotting	0	284	June 19, 2022

How to approximate a distribution function from an arbitrary list?

Related topics