Suppose I have a vector containing the following integers: [1 1 1 2 3 5 5 9]
Now I wish to create a vector that ranks the integers in the following way:
rank = [1 1 1 2 3 4 4 5]
The rank essentially answers the question for each integer “number of unique integers larger + 1” but implementing that is causing some problems. Maybe there is a function already?
This is my attempt, but how do I broadcast it to the vector at once?
length(unique(A[A .< A])) + 1
sampleRanks: GitHub - oheil/NormalizeQuantiles.jl: NormalizeQuantiles.jl implements quantile normalization
An elementary way would be
x = [1,1,1,2,3,5,5,9]
xu = unique(x)
ranks = [ count( xu .< z ) + 1 for z in x ]
(Of course, @oheil’s package does more than that, i.e. it also takes care of edge cases and supports matrix inputs etc.)
Thats why it’s typically slower…
StatsBase.denserank seems to be exactly what you need.
@SteffenPL’s function (and any implementation based on
count) will have worst-case O(n^2) complexity (when all the entries of
x are different).
But by sorting
x out the outset, it is possible to do this in O(n \log n)-time using something like this:
p = sortperm(x) # to recover sort later
res = ones(Int, length(x))
currentrank = 1
for i in 2:length(x)
if x[i-1] < x[i]
currentrank += 1
res[i] = currentrank
julia> x = [1, 6, 3, 3, 7, 1, 1]; denserank(x)
This is the approach used by
StatsBase.denserank, albeit as the source code here reveals, there are a few layers of abstraction in between so it can support more than just
In summary, you should either use
StatsBase.denserank, or if you roll your own function, base it on a sort algorithm instead of than