Median doesn't behave like mean on a vector of vectors

julia> using Statistics

julia> y = [[i for i in 4*(j-1)+1:4*j] for j in 1:3]
3-element Vector{Vector{Int64}}:
 [1, 2, 3, 4]
 [5, 6, 7, 8]
 [9, 10, 11, 12]

julia> mean(y)
4-element Vector{Float64}:

julia> median(y)
1 Like
# mid = div(first(inds)+last(inds),2) == 2

julia> partialsort!(y,2 )
4-element Vector{Int64}:

julia> middle(partialsort!(y,mid))

It appears that the median code calculates the middle of the median vector, if there is a median vector.
if it isn’t there, it goes into error

julia> yy = [[i for i in 4*(j-1)+1:4*j] for j in 1:4]     
4-element Vector{Vector{Int64}}:
 [1, 2, 3, 4]
 [5, 6, 7, 8]
 [9, 10, 11, 12]
 [13, 14, 15, 16]

julia> median(yy)
ERROR: MethodError: no method matching middle(::Vector{Int64}, ::Vector{Int64})
1 Like

Indeed. I don’t think that this is an expected behavior.

julia> y = [[i for i in 4*(j-1)+1:4*j] for j in 1:3]
3-element Vector{Vector{Int64}}:
 [1, 2, 3, 4]
 [5, 6, 7, 8]
 [9, 10, 11, 12]
julia> y[2][4]=99

julia> median(y)

I guess the main issue (if it is one, I don’t know if this behavior is intentional) is coming from > “working” on vectors:

julia> a = [1,99,99];b=[2,1,1];

julia> a>b

it just compares the first index of a and b, or the next one in case of equality. I understand why this behavior makes sense for testing a==b when they are a collection, but the meaning of a>b when they are collection hardly makes sense for me.

1 Like

mmmhh … it seems to me that the behavior of ‘>’ on a pair of vectors is the “normal” one for lexicographic comparison.
Rather, I draw attention to the median!() method which, for a vector with an odd number of elements, calculates the middle of the median element.

function median!(v::AbstractVector)
    isempty(v) && throw(ArgumentError("median of an empty array is undefined, $(repr(v))"))
    eltype(v)>:Missing && any(ismissing, v) && return missing
    any(x -> x isa Number && isnan(x), v) && return convert(eltype(v), NaN)
    inds = axes(v, 1)
    n = length(inds)
    mid = div(first(inds)+last(inds),2)
    if isodd(n)
        return middle(partialsort!(v,mid))
        m = partialsort!(v, mid:mid+1)
        return middle(m[1], m[2])

If the elements are themselves vectors then middle behaves as per the contract

Compute the middle of an array a, which consists of finding its extrema and then computing their mean.

in your example: (5+99)/2=52.0