Hey, folks
I’d like some help understanding the cdf
and quantile
functions.
Let x \in \mathbb{R}^n be a data vector.
using Random
Random.seed!(12345)
x = rand(Uniform(100, 500), 10000)
Now, I want to calculate the cumulative distribution function of x, F_X(x).
Fx = cdf.(Uniform(100, 500), x)
From my understanding given what I read in the Distributions.jl webpage for function cdf
, to get the F_X(x'), we provide x'. And that’s what I did. I provided the vector x
and used the broadcasting cdf
to get calculate the CDF for every element in x
. Given I know what distribution the data came from, I knew what distribution to use in the call of cdf()
.
Now, I want to apply the quantile function to the CDF. Basically, I had a data vector x to which I applied the F_X(x) to obtain a vector Fx
\in [0,1]. Then, I will leave the the [0,1] domain back to the “data” domain using the quantile function, \Phi^{-1}(X) \xrightarrow{}{} X. So I did
y = rand(Uniform(0,1), 100);
quantile(y, Fx)
%
% 10000-element Vector{Float64}:
% 0.8007446939133416
% 0.1224036949634858
% 0.33105528548750984
% 0.8317826664840936
% ⋮
% 0.10683772281267323
% 0.5217773901084406
% 0.7523328558208143
where y
is the itr
and Fx
is the p
vector of probabilities, as explained here.
This answer was not what I expected. I gave a vector y
of 100 elements to quantile()
, and I expected to get a vector of the same length. Likewise, Fx
was built on a Uniform distribution defined as \mathcal{f}: U[100, 500]. Therefore, I expected to see values from that range as my range.
If I have a U[100, 500], I expect the \Phi^{-1}(0.5) \ \tilde{=} \ \mathbb{E}(\mathcal{f}) = 300. What I got was a 10000-element vector being all elements equal to 0.5.
quantile(.5, Fx)
%10000-element Vector{Float64}:
% 0.5
% 0.5
% 0.5
% 0.5
% ⋮
% 0.5
% 0.5
% 0.5
This has been sufficient to show me I don’t fully understand how these functions work, even though I thought I understood it from the documents. Can anyone point out what my mistake/misconception is? Thank you for your help.