How to find rankings of each tuple in the list of tuples?

I have a list of tuples something like as shown:

v = [(i,rand(1:15)) for i=1:30]

Now I want to rank each tuple according to the second value, which I was able to achieve using many functions as shown.

vcat([map(k->(k[1],i),filter(x->x[2]==j,v)) for (i,j) in enumerate(sort(unique(last.(v))))]...)

Is there any simpler function that can give the same result? Thanks for your help as always.
note following is not desired

sort(v, by x->x[2])

Apologies but I don’t understand your question. β€œrank each tuple according to the second value” to me means (for a length-2 tuple) sort(v, by = last).

Your longer expression constructs new tuples, and only returns the same tuples as in the original list around 1/9 of the time:

julia> function compare_sorts(n)
           res = 0
           for _ ∈ 1:n
               v = [(i,rand(1:15)) for i=1:30]
               sort1 = vcat([map(k->(k[1],i),filter(x->x[2]==j,v)) for (i,j) in enumerate(sort(unique(last.(v))))]...)
               sort2 = sort(v, by = last)
               res += sort1 == sort2
           end
           return res/n
       end
compare_sorts (generic function with 1 method)

julia> compare_sorts(10_000)
0.1108

with more samples this converges to 0.1111… so I’m sure it can be shown analytically that the expected value given your chosen example values of 30 tuples with random integers from 1:15.

The issue is enumerate which will create different i and j values at some point unless the second values of your 30 tuples include all consecutive numbers from 1 up. This being the Julia Discourse, there’s a good chance someone will be along shortly to show that the probability of this happening is indeed 1/9, peasants like me will just brute force it:

julia> z = [rand(1:15, 30) for _ ∈ 1:1_000_000];

julia> sum(count(length(unique(i)) == x && maximum(i) == x for i ∈ z) for x ∈ 11:15)/1_000_000
0.110809

Is this intended behaviour? If so your function is probably fine, I might have written it like that:

reduce(vcat, (k->(k[1],i)).(filter(x->x[2]==j,v)) for (i,j) in enumerate(sort(unique(last.(v)))))

I would have expected this to be faster, but for some reason in my benchmarking I see an extra 10 allocations and about 10% worse performance compared to your version so :person_shrugging:

1 Like

Thank you very much for your elaborate consideration. Yes, sort(v, by = last) sorts the tuple list based on the last value and it does rank elements but for my requirement instead of returning the original sorted v , I want to modify v such that ranking is done based on the positive integers. For instance, if I assume each tuple in v contains a student number corresponding to an exam score, (note more than one student can have the same exam score), now instead of ranking students based on their actual exam score, I want to rank them based on the position. i.e, whoever scores the lowest gets position 1, and so on. For example say v=[(1,15),(2,13),(4,13), (5,6)] should be v=[(1,3),(2,2),(4,2),(5,1)] hope my question is clear. Thanks once again

julia> using StatsBase

julia> collect(zip(first.(v),denserank(last.(v))))
4-element Vector{Tuple{Int64, Int64}}:
 (1, 3)
 (2, 2)
 (4, 2)
 (5, 1)

For more information about denserank, the usual, press ? for help and type denserank will do.

3 Likes

Ok, in this case I would do this:

julia> using DataFrames, StatsBase

julia> df = DataFrame(student = first.(v), grade = last.(v))
30Γ—2 DataFrame
 Row β”‚ student  grade
     β”‚ Int64    Int64
─────┼────────────────
   1 β”‚       1     11
   2 β”‚       2      4
   3 β”‚       3     12
   4 β”‚       4      8
   5 β”‚       5      2
   6 β”‚       6      3
   7 β”‚       7     14
   8 β”‚       8      2
   9 β”‚       9      7
  10 β”‚      10      2
  11 β”‚      11     10
  12 β”‚      12     10
  13 β”‚      13     10
  14 β”‚      14     11
  15 β”‚      15      2
  16 β”‚      16     15
  17 β”‚      17     12
  18 β”‚      18      5
  19 β”‚      19     15
  20 β”‚      20      7
  21 β”‚      21     11
  22 β”‚      22      8
  23 β”‚      23     11
  24 β”‚      24     12
  25 β”‚      25     13
  26 β”‚      26      2
  27 β”‚      27      1
  28 β”‚      28      1
  29 β”‚      29     13
  30 β”‚      30      2

julia> df.rank = denserank(df.grade); sort!(df, :rank)
30Γ—3 DataFrame
 Row β”‚ student  grade  rank
     β”‚ Int64    Int64  Int64
─────┼───────────────────────
   1 β”‚      27      1      1
   2 β”‚      28      1      1
   3 β”‚       5      2      2
   4 β”‚       8      2      2
   5 β”‚      10      2      2
   6 β”‚      15      2      2
   7 β”‚      26      2      2
   8 β”‚      30      2      2
   9 β”‚       6      3      3
  10 β”‚       2      4      4
  11 β”‚      18      5      5
  12 β”‚       9      7      6
  13 β”‚      20      7      6
  14 β”‚       4      8      7
  15 β”‚      22      8      7
  16 β”‚      11     10      8
  17 β”‚      12     10      8
  18 β”‚      13     10      8
  19 β”‚       1     11      9
  20 β”‚      14     11      9
  21 β”‚      21     11      9
  22 β”‚      23     11      9
  23 β”‚       3     12     10
  24 β”‚      17     12     10
  25 β”‚      24     12     10
  26 β”‚      25     13     11
  27 β”‚      29     13     11
  28 β”‚       7     14     12
  29 β”‚      16     15     13
  30 β”‚      19     15     13

You might of course have a good reason to work with vectors of tuples, but given your data and what you’re doing with it they seem a suboptimal data structure to me.

2 Likes

Thank you very much, I love it.

Thank you very much. I really appreciate your help. I liked your take on dataframe. Thanks again.

If you don’t want to depend on any package

l = length(v)
u = unique(last.(v))
map(i->(v[i][1],length(u)-count(>(v[i][2]), u)), 1:l)
1 Like