Median vs 50th Quantile giving different answers

stats

#1

I’m getting different answers for median(v,w::AnalyticWeights) and quantile(v,w::AnalyticWeights,0.5) and not sure why. Any ideas?


#2

Please provide an example


#3
using StatsBase; using Distributions
v=[1; 4; 3; 2; 2.5; 7];w=[0.1;0.3;0.05;0.05;0.2;0.3]
median(v,weights(w)::AbstractWeights)
quantile(v,weights(w)::AbstractWeights,0.5)

Median returns 4.0 and Quantile returns 3.5.


#4

Sometimes a quantile isn’t uniquely defined (often solved by taking the average of the endpoint of the interval of quantile points). However, it only makes sense to use a definition that ensures that median and 0.5 quantile are identical.

In this case, it seems that things are worse. To me, it seems that the result of quantile is just wrong. The 0.5 quantile and the median, say m, is the same thing and should satisfy P(X \geqslant m) \geqslant \frac{1}{2} and P(X \leqslant m) \geqslant \frac{1}{2}. For your inputs, I get

julia> x = [1; 4; 3; 2; 2.5; 7];

julia> w = [0.1;0.3;0.05;0.05;0.2;0.3];

julia> sum(w[x .<= 3.5])
0.4

so 3.5 is not a median. To see that 4 is the unique median, you can create the following table and see that the row with x=4 is the only one that has probabilities higher than \frac{1}{2}.

julia> p = sortperm(x);

julia> table(cumsum(w[p]), reverse(cumsum(reverse(w[p]))), x[p], names = [Symbol("P(X<=x)"), Symbol("P(X>=x)"), :x])
Table with 6 rows, 3 columns:
P(X<=x)  P(X>=x)  x
─────────────────────
0.1      1.0      1.0
0.15     0.9      2.0
0.35     0.85     2.5
0.4      0.65     3.0
0.7      0.6      4.0
1.0      0.3      7.0

#5

See also https://github.com/JuliaStats/StatsBase.jl/pull/316 and discussion at https://github.com/JuliaStats/StatsBase.jl/issues/313. Matthieu Gomez is the person to contact about this, but he isn’t on Discourse AFAICT.

Note that quantile(v, fweights(w)) gives yet another (incorrect) answer (7.0).


#6

Simply, when number of elements is even, it is assigned to a mean of two central elements.


#7

It’s not clear why 3 should be considered a central element here though given the weight vector.