# Plotting Probability Distribution of Data

I am trying to produce a plot of probability distribution function of some data which is a result of simulations. Using density function from StatPlots is showing some weird y-axis numbers. I would expect it to be ranging from 0 to 1 as these are probabilities, but it shows numbers 1e-8 to 1e-7. Here is the code

using StatsPlots
density(rand(0:1e7,10000),  xlabel = "Loss", ylabel = "Probability")


A plot of CDF does show the right scaling for probabilities.

ecdfplot(rand(1000:1e7,10000), xlabel = "Loss", ylabel = "
Cumulative Probability")


Is there a way to plot probability distribution function of empirical or simulated data in away that shows the scale going up to 1 for probabilities on y-axis?

The density plot is correct:

1. Density considers observations x_1, \ldots, x_n as continuous and estimates their probability density function p(x). Note that the probability density is not a probability, but needs to be integrated over some interval (or more generally measurable set) in order to get a probability. In particular, for a sufficiently peaked distribution the density can exceed 1 (see example below).
\mathbb{P}(X \in [a, b]) = \int_a^b p(x) dx
1. Your data has a wide support from 0 to 10^7 and accordingly the (uniform) density will be around 10^{-7} corresponding to a total probability mass of one, i.e., when integrating the density over the whole support: \mathbb{P}(X \in [0, 10^7]) = 1 = \int_0^{10^7} 10^{-7} dx.
julia> using Distributions

julia> density(rand(Normal(0, 0.1), 10000))  # Density will exceed 1


In case of discrete observations, you may want to look at histograms or simply compute frequencies, i.e.,

julia> fs = frequencies(rand(0:1000, 10000));

julia> bar(collect(keys(fs)), values(fs) ./ sum(values(fs)))


Thank you. There is no function frequencies so when I run your suggestion I get an error.

Sorry, seems I had Flux loaded where its a deprecated function.
Here is a full example showing all libraries and using countmap from StatsBase as an alternative:

julia> using StatsBase

julia> fs = countmap(rand(0:1000, 10000));

julia> using Plots

julia> bar(collect(keys(fs)), values(fs) ./ sum(values(fs)))

1 Like