Rank with ties

Suppose I have a vector containing the following integers: [1 1 1 2 3 5 5 9]

Now I wish to create a vector that ranks the integers in the following way:

rank = [1 1 1 2 3 4 4 5]

The rank essentially answers the question for each integer “number of unique integers larger + 1” but implementing that is causing some problems. Maybe there is a function already?

This is my attempt, but how do I broadcast it to the vector at once?

length(unique(A[A .< A[8]])) + 1

See

function sampleRanks: GitHub - oheil/NormalizeQuantiles.jl: NormalizeQuantiles.jl implements quantile normalization

An elementary way would be

x = [1,1,1,2,3,5,5,9]
xu = unique(x)
ranks = [ count( xu .< z ) + 1 for z in x ]

(Of course, @oheil’s package does more than that, i.e. it also takes care of edge cases and supports matrix inputs etc.)

Thats why it’s typically slower…

StatsBase.denserank seems to be exactly what you need.

3 Likes

@SteffenPL’s function (and any implementation based on count) will have worst-case O(n^2) complexity (when all the entries of x are different).

But by sorting x out the outset, it is possible to do this in O(n \log n)-time using something like this:

function denserank(x::Vector)
    p = sortperm(x)             # to recover sort later
    permute!(x, p)
    res = ones(Int, length(x))
    currentrank = 1
    for i in 2:length(x)
        if x[i-1] < x[i]
            currentrank += 1
        end
        res[i] = currentrank
    end
    invpermute!(res, p)
end
julia> x = [1, 6, 3, 3, 7, 1, 1]; denserank(x)
7-element Vector{Int64}:
 1
 3
 2
 2
 4
 1
 1

This is the approach used by StatsBase.denserank, albeit as the source code here reveals, there are a few layers of abstraction in between so it can support more than just Vector input.

In summary, you should either use StatsBase.denserank, or if you roll your own function, base it on a sort algorithm instead of than count.

1 Like