I am trying to produce a plot of probability distribution function of some data which is a result of simulations. Using `density`

function from `StatPlots`

is showing some weird y-axis numbers. I would expect it to be ranging from 0 to 1 as these are probabilities, but it shows numbers 1e-8 to 1e-7. Here is the code

```
using StatsPlots
density(rand(0:1e7,10000), xlabel = "Loss", ylabel = "Probability")
```

A plot of CDF does show the right scaling for probabilities.

```
ecdfplot(rand(1000:1e7,10000), xlabel = "Loss", ylabel = "
Cumulative Probability")
```

Is there a way to plot probability distribution function of empirical or simulated data in away that shows the scale going up to 1 for probabilities on y-axis?

The density plot is correct:

- Density considers observations x_1, \ldots, x_n as continuous and estimates their probability density function p(x). Note that the probability density is not a probability, but needs to be integrated over some interval (or more generally measurable set) in order to get a probability. In particular, for a sufficiently peaked distribution the density can exceed 1 (see example below).

\mathbb{P}(X \in [a, b]) = \int_a^b p(x) dx

- Your data has a wide support from 0 to 10^7 and accordingly the (uniform) density will be around 10^{-7} corresponding to a total probability mass of one, i.e., when integrating the density over the whole support: \mathbb{P}(X \in [0, 10^7]) = 1 = \int_0^{10^7} 10^{-7} dx.

```
julia> using Distributions
julia> density(rand(Normal(0, 0.1), 10000)) # Density will exceed 1
```

In case of discrete observations, you may want to look at histograms or simply compute frequencies, i.e.,

```
julia> fs = frequencies(rand(0:1000, 10000));
julia> bar(collect(keys(fs)), values(fs) ./ sum(values(fs)))
```

Thank you. There is no function `frequencies`

so when I run your suggestion I get an error.

Sorry, seems I had `Flux`

loaded where its a deprecated function.

Here is a full example showing all libraries and using `countmap`

from `StatsBase`

as an alternative:

```
julia> using StatsBase
julia> fs = countmap(rand(0:1000, 10000));
julia> using Plots
julia> bar(collect(keys(fs)), values(fs) ./ sum(values(fs)))
```

1 Like