StatsBase.summarystats not defined for Rational numbers

Hi,

I have code that uses StatsBase.summarystats on a Vector{T} where T <: Real

sometimes T can be Rational, and summarystats fails as it is not defined on Rational.

Is there an alternative I can use?

My current workaround is to use this block:

if eltype(data) <: Rational
    summstats = summarystats(float.(data))
    q1 = rationalize(summstats.q25)
    q3 = rationalize(summstats.q75)
else
    summstats = summarystats(data)
    q1 = summstats.q25
    q3 = summstats.q75
end

But I was wondering if there was a julian way to do it without the if/else branching.

Thanks in advance.

I guess you could locally define the method if that’s what you want it to do:

ulia> import StatsBase:summarystats

julia> summarystats(x::Vector{<:Rational}) = summarystats(float.(x))
summarystats (generic function with 2 methods)

julia> data = Rational.(rand(10))
10-element Vector{Rational{Int64}}:
 2128900019238815//2251799813685248
   49740005378731//1125899906842624
  334714768863577//1125899906842624
 1998648540630853//4503599627370496
 1894366034501649//4503599627370496
 4093457228645821//4503599627370496
  186443106130015//281474976710656
 2069399109862347//2251799813685248
 1119960256153467//1125899906842624
  614936132122229//2251799813685248

julia> summarystats(data)
Summary Stats:
Length:         10
Missing Count:  0
Mean:           0.590943
Minimum:        0.044178
1st Quartile:   0.328123
Median:         0.553084
3rd Quartile:   0.916481
Maximum:        0.994725

but this is type piracy so proceed with caution and don’t do this in library code.

I haven’t thought about potential complications of defining this method in general but it might be worth opening an issue to discuss with maintainers whether this should maybe be added?

1 Like

The quantile function seems to work fine with rationals, so you can implement your block simply as q1, q3 = quantile(data, [1//4, 3//4]).

julia> quantile(rand(50),[1//4,3//4])
2-element Array{Float64,1}:
 0.2188261621212782
 0.8143315574024831

julia> quantile(rand(1:20,50).//rand(1:20,50), [1//4,3//4])
2-element Array{Rational{Int64},1}:
 11//24
 23//12

An issue to StatsBase might be worthwhile anyway. I see no reason why summarystats shouldn’t supports rationals too.

2 Likes

It turns out that this only works if the percentile values are also Rational. If my data is Rational but the percentile values are Float64, then the result is also Float64. Still, this is a good idea.