Median doesn't behave like mean on a vector of vectors

julia> using Statistics

julia> y = [[i for i in 4*(j-1)+1:4*j] for j in 1:3]
3-element Vector{Vector{Int64}}:
 [1, 2, 3, 4]
 [5, 6, 7, 8]
 [9, 10, 11, 12]

julia> mean(y)
4-element Vector{Float64}:
 5.0
 6.0
 7.0
 8.0

julia> median(y)
6.5
1 Like
# mid = div(first(inds)+last(inds),2) == 2

julia> partialsort!(y,2 )
4-element Vector{Int64}:
 5
 6
 7
 8

julia> middle(partialsort!(y,mid))
6.5

It appears that the median code calculates the middle of the median vector, if there is a median vector.
if it isn’t there, it goes into error


julia> yy = [[i for i in 4*(j-1)+1:4*j] for j in 1:4]     
4-element Vector{Vector{Int64}}:
 [1, 2, 3, 4]
 [5, 6, 7, 8]
 [9, 10, 11, 12]
 [13, 14, 15, 16]

julia> median(yy)
ERROR: MethodError: no method matching middle(::Vector{Int64}, ::Vector{Int64})
1 Like

Indeed. I don’t think that this is an expected behavior.

julia> y = [[i for i in 4*(j-1)+1:4*j] for j in 1:3]
3-element Vector{Vector{Int64}}:
 [1, 2, 3, 4]
 [5, 6, 7, 8]
 [9, 10, 11, 12]
julia> y[2][4]=99
99

julia> median(y)
52.0

I guess the main issue (if it is one, I don’t know if this behavior is intentional) is coming from > “working” on vectors:

julia> a = [1,99,99];b=[2,1,1];

julia> a>b
false

it just compares the first index of a and b, or the next one in case of equality. I understand why this behavior makes sense for testing a==b when they are a collection, but the meaning of a>b when they are collection hardly makes sense for me.

1 Like

mmmhh … it seems to me that the behavior of ‘>’ on a pair of vectors is the “normal” one for lexicographic comparison.
Rather, I draw attention to the median!() method which, for a vector with an odd number of elements, calculates the middle of the median element.

function median!(v::AbstractVector)
    isempty(v) && throw(ArgumentError("median of an empty array is undefined, $(repr(v))"))
    eltype(v)>:Missing && any(ismissing, v) && return missing
    any(x -> x isa Number && isnan(x), v) && return convert(eltype(v), NaN)
    inds = axes(v, 1)
    n = length(inds)
    mid = div(first(inds)+last(inds),2)
    if isodd(n)
        return middle(partialsort!(v,mid))
    else
        m = partialsort!(v, mid:mid+1)
        return middle(m[1], m[2])
    end
end

If the elements are themselves vectors then middle behaves as per the contract

middle(a::AbstractArray)
Compute the middle of an array a, which consists of finding its extrema and then computing their mean.

in your example: (5+99)/2=52.0