How to find rankings of each tuple in the list of tuples?

Phuntsho · December 8, 2022, 2:44pm

I have a list of tuples something like as shown:

v = [(i,rand(1:15)) for i=1:30]

Now I want to rank each tuple according to the second value, which I was able to achieve using many functions as shown.

vcat([map(k->(k[1],i),filter(x->x[2]==j,v)) for (i,j) in enumerate(sort(unique(last.(v))))]...)

Is there any simpler function that can give the same result? Thanks for your help as always.
note following is not desired

sort(v, by x->x[2])

nilshg · December 8, 2022, 4:45pm

Apologies but I don’t understand your question. “rank each tuple according to the second value” to me means (for a length-2 tuple) sort(v, by = last).

Your longer expression constructs new tuples, and only returns the same tuples as in the original list around 1/9 of the time:

julia> function compare_sorts(n)
           res = 0
           for _ ∈ 1:n
               v = [(i,rand(1:15)) for i=1:30]
               sort1 = vcat([map(k->(k[1],i),filter(x->x[2]==j,v)) for (i,j) in enumerate(sort(unique(last.(v))))]...)
               sort2 = sort(v, by = last)
               res += sort1 == sort2
           end
           return res/n
       end
compare_sorts (generic function with 1 method)

julia> compare_sorts(10_000)
0.1108

with more samples this converges to 0.1111… so I’m sure it can be shown analytically that the expected value given your chosen example values of 30 tuples with random integers from 1:15.

The issue is enumerate which will create different i and j values at some point unless the second values of your 30 tuples include all consecutive numbers from 1 up. This being the Julia Discourse, there’s a good chance someone will be along shortly to show that the probability of this happening is indeed 1/9, peasants like me will just brute force it:

julia> z = [rand(1:15, 30) for _ ∈ 1:1_000_000];

julia> sum(count(length(unique(i)) == x && maximum(i) == x for i ∈ z) for x ∈ 11:15)/1_000_000
0.110809

Is this intended behaviour? If so your function is probably fine, I might have written it like that:

reduce(vcat, (k->(k[1],i)).(filter(x->x[2]==j,v)) for (i,j) in enumerate(sort(unique(last.(v)))))

I would have expected this to be faster, but for some reason in my benchmarking I see an extra 10 allocations and about 10% worse performance compared to your version so

Phuntsho · December 8, 2022, 5:10pm

Thank you very much for your elaborate consideration. Yes, sort(v, by = last) sorts the tuple list based on the last value and it does rank elements but for my requirement instead of returning the original sorted v , I want to modify v such that ranking is done based on the positive integers. For instance, if I assume each tuple in v contains a student number corresponding to an exam score, (note more than one student can have the same exam score), now instead of ranking students based on their actual exam score, I want to rank them based on the position. i.e, whoever scores the lowest gets position 1, and so on. For example say v=[(1,15),(2,13),(4,13), (5,6)] should be v=[(1,3),(2,2),(4,2),(5,1)] hope my question is clear. Thanks once again

Dan · December 8, 2022, 5:19pm

julia> using StatsBase

julia> collect(zip(first.(v),denserank(last.(v))))
4-element Vector{Tuple{Int64, Int64}}:
 (1, 3)
 (2, 2)
 (4, 2)
 (5, 1)

For more information about denserank, the usual, press ? for help and type denserank will do.

nilshg · December 8, 2022, 5:22pm

Ok, in this case I would do this:

julia> using DataFrames, StatsBase

julia> df = DataFrame(student = first.(v), grade = last.(v))
30×2 DataFrame
 Row │ student  grade
     │ Int64    Int64
─────┼────────────────
   1 │       1     11
   2 │       2      4
   3 │       3     12
   4 │       4      8
   5 │       5      2
   6 │       6      3
   7 │       7     14
   8 │       8      2
   9 │       9      7
  10 │      10      2
  11 │      11     10
  12 │      12     10
  13 │      13     10
  14 │      14     11
  15 │      15      2
  16 │      16     15
  17 │      17     12
  18 │      18      5
  19 │      19     15
  20 │      20      7
  21 │      21     11
  22 │      22      8
  23 │      23     11
  24 │      24     12
  25 │      25     13
  26 │      26      2
  27 │      27      1
  28 │      28      1
  29 │      29     13
  30 │      30      2

julia> df.rank = denserank(df.grade); sort!(df, :rank)
30×3 DataFrame
 Row │ student  grade  rank
     │ Int64    Int64  Int64
─────┼───────────────────────
   1 │      27      1      1
   2 │      28      1      1
   3 │       5      2      2
   4 │       8      2      2
   5 │      10      2      2
   6 │      15      2      2
   7 │      26      2      2
   8 │      30      2      2
   9 │       6      3      3
  10 │       2      4      4
  11 │      18      5      5
  12 │       9      7      6
  13 │      20      7      6
  14 │       4      8      7
  15 │      22      8      7
  16 │      11     10      8
  17 │      12     10      8
  18 │      13     10      8
  19 │       1     11      9
  20 │      14     11      9
  21 │      21     11      9
  22 │      23     11      9
  23 │       3     12     10
  24 │      17     12     10
  25 │      24     12     10
  26 │      25     13     11
  27 │      29     13     11
  28 │       7     14     12
  29 │      16     15     13
  30 │      19     15     13

You might of course have a good reason to work with vectors of tuples, but given your data and what you’re doing with it they seem a suboptimal data structure to me.

Phuntsho · December 8, 2022, 5:25pm

Thank you very much, I love it.

Phuntsho · December 8, 2022, 5:27pm

Thank you very much. I really appreciate your help. I liked your take on dataframe. Thanks again.

rocco_sprmnt21 · December 9, 2022, 8:41am

If you don’t want to depend on any package

l = length(v)
u = unique(last.(v))
map(i->(v[i][1],length(u)-count(>(v[i][2]), u)), 1:l)

Topic		Replies	Views
The sorting of Tuples General Usage tuple , sort	3	1608	October 25, 2017
Rank with ties New to Julia	5	326	July 21, 2022
How to order a array of tuples New to Julia	4	5104	July 4, 2019
When appllying roll rank function, how much faster Julia can be compared to Python? New to Julia	19	630	March 18, 2022
Sorting by two values (basic sorting) New to Julia sort , arrays	7	4146	March 11, 2021

How to find rankings of each tuple in the list of tuples?

Related topics