# Functions for median of ordinal data

If you have an array of `Date()` objects and you want to find the middle one, what do you do?

• If you have an odd number of objects, just take the middle one
• If you have an even number, you would normally do the mean of the middle two. But mathematical operations aren’t defined for dates, nor should they be, so you can’t do that.

Returning an array of the middle two also sounds like odd behavior, since it leads to type instability. At the same time, it seems reasonable to want the middle(ish) value of an array of dates.

Are there any standard practices for dealing with this?

This appears to be a conceptual question, not something specific to Julia or even programming. Nevertheless, sample quantiles can be defined for ordinal data, you just don’t interpolate and need to break ties, eg by rounding the index up.

yes, it is conceptual, but it’s related to how my rewrite of DataFrames’s `describe` would work with quantiles.

I suppose the Julian flavor of this question to be whether there is a standard for doing this within Julia packages, particularly with regards to the `Date` type.

We want the summary statistics to be as flexible as possible, so we use `try... catch` to see if we can get an output for non-numeric types. Say, for example, you have `MyType` defined and `MyType(1) < MyType(2)`, then we can still tell you the minimum of that column, returning nothing if its not defined.

For returning the 25th, 50th, and 75th quantiles, we are using the `quantile` function, which requires Mathematical operations. So `MyType` wouldn’t work.

The obvious answer is to say, “hey, if you want us to give you something other than `nothing` in return, write a method for `quantile` that gives you what you want.” That’s a fine solution, I think, but maybe there is a more general function for all types of ordinal data that I don’t know about, and that is commonly used in Julia packages.

I am not sure about the obvious answer, but a solution could involve traits, which carry information about whether values of a type are cardinal (everything `<: Real`), ordinal (eg `Date`), or nominal (the default). Then reporting would just use this information, eg interpolate for cardinal values, use uninterpolated quantiles for ordinal, and maybe just show the 5 most common values for nominal.

I think i would just call sort and use something like `floor(length(col) * .25`, then have that in the documentation.

However you don’t want to do anything too expensive, with, say, strings, and cause `describe` to be slow. Probably best to finish up the pull request now and then see what people think.

I’m curious how other software handles this.

1 Like

In R

``````library(lubridate)
# Note that we have an even number of dates, so median
# is not obviously defined.
dates = dates = ymd("20010101", "20020101", "20030101", "20040101", "20050101", "20060101")
median(dates)
> 2003-07-02
# returns the midpoint. Also, note that it does not drop down into
# seconds, but will round down instead of doing that.
quantile(dates)
> Error # you need a special option
quantile(dates, type = 1)  #Whatever that means
> 0%
2001-01-01
25%
2002-01-01
50%
2003-01-01
75%
2004-01-01
100%
2005-01-01
# Clearly it rounds down.
``````

As far as I can tell, python throws an error for any thing, be it Panda’s date format or `datetime`'s date format. Though I could have sworn I had something the other day… If anyone works with dates regularly in python feel free to pitch in.

1 Like

Given that `quantile` might be moved into stdlib soon, maybe now we could make a push to add an option like what R has.

Then in dataframes we could use `try...catch` twice, once to see if the user-defined object has a normal median working with it, and a second time to see if a special `ordinal` option works, then return whatever that is for `quantile` and `median`.

But then add another method for strings… since the user probably isn’t interested in the minimum and maximum string.

Opened up an issue here! I think it makes sense for it to live in `Base` because the current `quantile` function is pretty complicated and an ordinal version would only change the very last step. But I’m sure the developers hear that a lot!