I am trying to produce a plot of probability distribution function of some data which is a result of simulations. Using density
function from StatPlots
is showing some weird y-axis numbers. I would expect it to be ranging from 0 to 1 as these are probabilities, but it shows numbers 1e-8 to 1e-7. Here is the code
using StatsPlots
density(rand(0:1e7,10000), xlabel = "Loss", ylabel = "Probability")
A plot of CDF does show the right scaling for probabilities.
ecdfplot(rand(1000:1e7,10000), xlabel = "Loss", ylabel = "
Cumulative Probability")
Is there a way to plot probability distribution function of empirical or simulated data in away that shows the scale going up to 1 for probabilities on y-axis?
The density plot is correct:
- Density considers observations x_1, \ldots, x_n as continuous and estimates their probability density function p(x). Note that the probability density is not a probability, but needs to be integrated over some interval (or more generally measurable set) in order to get a probability. In particular, for a sufficiently peaked distribution the density can exceed 1 (see example below).
\mathbb{P}(X \in [a, b]) = \int_a^b p(x) dx
- Your data has a wide support from 0 to 10^7 and accordingly the (uniform) density will be around 10^{-7} corresponding to a total probability mass of one, i.e., when integrating the density over the whole support: \mathbb{P}(X \in [0, 10^7]) = 1 = \int_0^{10^7} 10^{-7} dx.
julia> using Distributions
julia> density(rand(Normal(0, 0.1), 10000)) # Density will exceed 1
In case of discrete observations, you may want to look at histograms or simply compute frequencies, i.e.,
julia> fs = frequencies(rand(0:1000, 10000));
julia> bar(collect(keys(fs)), values(fs) ./ sum(values(fs)))
Thank you. There is no function frequencies
so when I run your suggestion I get an error.
Sorry, seems I had Flux
loaded where its a deprecated function.
Here is a full example showing all libraries and using countmap
from StatsBase
as an alternative:
julia> using StatsBase
julia> fs = countmap(rand(0:1000, 10000));
julia> using Plots
julia> bar(collect(keys(fs)), values(fs) ./ sum(values(fs)))
1 Like