Implementing an empircal r-th q-quantile

rikh · June 28, 2022, 11:16am

I’m trying to implement an algorithm from a paper which makes use of empirical r-th q-quantiles of the marginals X^{(1)}, ..., X^{(p)} for a dataset X with p features (Bénard et al., 2021).

If I understand this correctly, this means that for each feature in the dataset, q-quantiles should be determined.

Would the following be the correct way to determine an “empirical 3-quantile” for some column in the dataset X^{(u)}?

julia> using StatsBase

julia> q = 3;

julia> X⁽ᵘ⁾ = [1, 2, 3];

julia> StatsBase.nquantile(X⁽ᵘ⁾, q)
4-element Vector{Float64}:
 1.0
 1.6666666666666667
 2.3333333333333335
 3.0

rikh · June 28, 2022, 12:28pm

They give a definition too for the r-th q-quantile a \hat{q}_{n,r}^{(j)} of \{ X_i^{(j)}, ..., X_n^{(j)} \} for r \in \{1, ..., q - 1\}. It is defined in Equation 4.2 in Bénard et al. (2021) by

\hat{q}_{n,r}^{(j)} = \inf \{ x \in \mathbb{R} \: : \: \frac{1}{n} \sum_{i=1}^n \mathbb{1}_{x_i^{(j)} \le x} \ge \frac{r}{q} \}

EDIT:

Got it (probably) thanks to Quantiles - Heinrich Hartmann:

function _empirical_quantile(V::AbstractVector, quantile::Real)
    @assert 0.0 ≤ quantile ≤ 1.0
    n = length(V)
    index = Int(floor(quantile * (n + 1)))
    if index == 0
        index = 1
    end
    if index == n + 1
        index = n
    end
    sorted = sort(V)
    return sorted[index]
end

function _cutpoints(V::AbstractVector, q::Int)
    quantiles = range(; start=0.0, stop=1.0, length=q)
    return _empirical_quantile.(Ref(V), quantiles)
end

julia> _cutpoints(1:10, 3)
3-element Vector{Int64}:
  1
  5
 10

Would still be great if someone could verify this. I don’t know much about mathematical statistics

Topic		Replies	Views
Julia equivalent of R's quantile function Performance r , functions , julia	7	1213	August 10, 2023
Applying `quantile` to `AbstractArray`s Statistics statistics	1	686	February 10, 2019
Quantile! much faster when called twice with a scalar than once with a vector! General Usage statistics	7	807	August 5, 2021
Quantiles Dirichlet distribution General Usage statistics , distributions	1	669	May 10, 2021
Best way to discretize continuous variable New to Julia	2	843	May 31, 2021

Implementing an empircal r-th q-quantile

Related topics