Median vs 50th Quantile giving different answers

davidzentlermunro · December 7, 2018, 10:48am

I’m getting different answers for median(v,w::AnalyticWeights) and quantile(v,w::AnalyticWeights,0.5) and not sure why. Any ideas?

andreasnoack · December 7, 2018, 11:01am

Please provide an example

davidzentlermunro · December 7, 2018, 11:38am

using StatsBase; using Distributions
v=[1; 4; 3; 2; 2.5; 7];w=[0.1;0.3;0.05;0.05;0.2;0.3]
median(v,weights(w)::AbstractWeights)
quantile(v,weights(w)::AbstractWeights,0.5)

Median returns 4.0 and Quantile returns 3.5.

andreasnoack · December 7, 2018, 12:37pm

Sometimes a quantile isn’t uniquely defined (often solved by taking the average of the endpoint of the interval of quantile points). However, it only makes sense to use a definition that ensures that median and 0.5 quantile are identical.

In this case, it seems that things are worse. To me, it seems that the result of quantile is just wrong. The 0.5 quantile and the median, say m, is the same thing and should satisfy P(X \geqslant m) \geqslant \frac{1}{2} and P(X \leqslant m) \geqslant \frac{1}{2}. For your inputs, I get

julia> x = [1; 4; 3; 2; 2.5; 7];

julia> w = [0.1;0.3;0.05;0.05;0.2;0.3];

julia> sum(w[x .<= 3.5])
0.4

so 3.5 is not a median. To see that 4 is the unique median, you can create the following table and see that the row with x=4 is the only one that has probabilities higher than \frac{1}{2}.

julia> p = sortperm(x);

julia> table(cumsum(w[p]), reverse(cumsum(reverse(w[p]))), x[p], names = [Symbol("P(X<=x)"), Symbol("P(X>=x)"), :x])
Table with 6 rows, 3 columns:
P(X<=x)  P(X>=x)  x
─────────────────────
0.1      1.0      1.0
0.15     0.9      2.0
0.35     0.85     2.5
0.4      0.65     3.0
0.7      0.6      4.0
1.0      0.3      7.0

nalimilan · December 7, 2018, 12:48pm

See also https://github.com/JuliaStats/StatsBase.jl/pull/316 and discussion at https://github.com/JuliaStats/StatsBase.jl/issues/313. Matthieu Gomez is the person to contact about this, but he isn’t on Discourse AFAICT.

Note that quantile(v, fweights(w)) gives yet another answer (7.0).

davidzentlermunro · December 7, 2018, 2:28pm

It’s not clear why 3 should be considered a central element here though given the weight vector.

nalimilan · February 18, 2019, 1:22pm

The inconsistency has been fixed by having median call quantile(x, w, 0.5). See this issue and the associated PR.

Topic		Replies	Views
Base.Statistics -- I'm confused by the differences in arguments General Usage	13	784	August 14, 2020
Odd result from Distributions quantile Statistics	3	544	July 4, 2019
Implementing an empircal r-th q-quantile Statistics question , statistics , distributions	1	288	June 28, 2022
Inconsistent llvm code General Usage	6	571	December 8, 2016
Quantile! much faster when called twice with a scalar than once with a vector! General Usage statistics	7	801	August 5, 2021

Median vs 50th Quantile giving different answers

Related topics