Iβm getting different answers for median(v,w::AnalyticWeights) and quantile(v,w::AnalyticWeights,0.5) and not sure why. Any ideas?
Please provide an example
using StatsBase; using Distributions
v=[1; 4; 3; 2; 2.5; 7];w=[0.1;0.3;0.05;0.05;0.2;0.3]
median(v,weights(w)::AbstractWeights)
quantile(v,weights(w)::AbstractWeights,0.5)
Median returns 4.0 and Quantile returns 3.5.
Sometimes a quantile isnβt uniquely defined (often solved by taking the average of the endpoint of the interval of quantile points). However, it only makes sense to use a definition that ensures that median and 0.5 quantile are identical.
In this case, it seems that things are worse. To me, it seems that the result of quantile
is just wrong. The 0.5 quantile and the median, say m, is the same thing and should satisfy P(X \geqslant m) \geqslant \frac{1}{2} and P(X \leqslant m) \geqslant \frac{1}{2}. For your inputs, I get
julia> x = [1; 4; 3; 2; 2.5; 7];
julia> w = [0.1;0.3;0.05;0.05;0.2;0.3];
julia> sum(w[x .<= 3.5])
0.4
so 3.5 is not a median. To see that 4 is the unique median, you can create the following table and see that the row with x=4 is the only one that has probabilities higher than \frac{1}{2}.
julia> p = sortperm(x);
julia> table(cumsum(w[p]), reverse(cumsum(reverse(w[p]))), x[p], names = [Symbol("P(X<=x)"), Symbol("P(X>=x)"), :x])
Table with 6 rows, 3 columns:
P(X<=x) P(X>=x) x
βββββββββββββββββββββ
0.1 1.0 1.0
0.15 0.9 2.0
0.35 0.85 2.5
0.4 0.65 3.0
0.7 0.6 4.0
1.0 0.3 7.0
See also https://github.com/JuliaStats/StatsBase.jl/pull/316 and discussion at https://github.com/JuliaStats/StatsBase.jl/issues/313. Matthieu Gomez is the person to contact about this, but he isnβt on Discourse AFAICT.
Note that quantile(v, fweights(w))
gives yet another answer (7.0
).
Itβs not clear why 3 should be considered a central element here though given the weight vector.
The inconsistency has been fixed by having median
call quantile(x, w, 0.5)
. See this issue and the associated PR.