How to intrepret the results from the 'cdf()` and `quantile()` function?

Hey, folks
I’d like some help understanding the cdf and quantile functions.

Let x \in \mathbb{R}^n be a data vector.

using Random
Random.seed!(12345)

x = rand(Uniform(100, 500), 10000)

Now, I want to calculate the cumulative distribution function of x, F_X(x).

Fx = cdf.(Uniform(100, 500), x)

From my understanding given what I read in the Distributions.jl webpage for function cdf, to get the F_X(x'), we provide x'. And that’s what I did. I provided the vector x and used the broadcasting cdf to get calculate the CDF for every element in x. Given I know what distribution the data came from, I knew what distribution to use in the call of cdf().

Now, I want to apply the quantile function to the CDF. Basically, I had a data vector x to which I applied the F_X(x) to obtain a vector Fx \in [0,1]. Then, I will leave the the [0,1] domain back to the “data” domain using the quantile function, \Phi^{-1}(X) \xrightarrow{}{} X. So I did

y = rand(Uniform(0,1), 100);
quantile(y, Fx)
%
% 10000-element Vector{Float64}:
% 0.8007446939133416
% 0.1224036949634858
% 0.33105528548750984
% 0.8317826664840936
% ⋮
% 0.10683772281267323
% 0.5217773901084406
% 0.7523328558208143

where y is the itr and Fx is the p vector of probabilities, as explained here.

This answer was not what I expected. I gave a vector y of 100 elements to quantile(), and I expected to get a vector of the same length. Likewise, Fx was built on a Uniform distribution defined as \mathcal{f}: U[100, 500]. Therefore, I expected to see values from that range as my range.

If I have a U[100, 500], I expect the \Phi^{-1}(0.5) \ \tilde{=} \ \mathbb{E}(\mathcal{f}) = 300. What I got was a 10000-element vector being all elements equal to 0.5.

quantile(.5, Fx)
%10000-element Vector{Float64}:
% 0.5
% 0.5
% 0.5
% 0.5
% ⋮
% 0.5
% 0.5
% 0.5

This has been sufficient to show me I don’t fully understand how these functions work, even though I thought I understood it from the documents. Can anyone point out what my mistake/misconception is? Thank you for your help.

I’m not sure what the documentation suggests should work, but there’s really just one principle:

  1. Functions take two arguments: (a) distribution and (b) data.

Things therefore look like:

using Distributions
d = Uniform(2, 3)
n = 100
x = rand(d, n)
p = cdf.(d, x)
x′ = quantile.(d, p)

In your examples, you seem to be using multiple different values of d and swapping the distribution and data arguments order.

2 Likes