# How to find rankings of each tuple in the list of tuples?

I have a list of tuples something like as shown:

``````v = [(i,rand(1:15)) for i=1:30]
``````

Now I want to rank each tuple according to the second value, which I was able to achieve using many functions as shown.

``````vcat([map(k->(k[1],i),filter(x->x[2]==j,v)) for (i,j) in enumerate(sort(unique(last.(v))))]...)
``````

Is there any simpler function that can give the same result? Thanks for your help as always.
note following is not desired

``````sort(v, by x->x[2])
``````

Apologies but I donβt understand your question. βrank each tuple according to the second valueβ to me means (for a length-2 tuple) `sort(v, by = last)`.

Your longer expression constructs new tuples, and only returns the same tuples as in the original list around 1/9 of the time:

``````julia> function compare_sorts(n)
res = 0
for _ β 1:n
v = [(i,rand(1:15)) for i=1:30]
sort1 = vcat([map(k->(k[1],i),filter(x->x[2]==j,v)) for (i,j) in enumerate(sort(unique(last.(v))))]...)
sort2 = sort(v, by = last)
res += sort1 == sort2
end
return res/n
end
compare_sorts (generic function with 1 method)

julia> compare_sorts(10_000)
0.1108
``````

with more samples this converges to 0.1111β¦ so Iβm sure it can be shown analytically that the expected value given your chosen example values of 30 tuples with random integers from 1:15.

The issue is `enumerate` which will create different `i` and `j` values at some point unless the second values of your 30 tuples include all consecutive numbers from 1 up. This being the Julia Discourse, thereβs a good chance someone will be along shortly to show that the probability of this happening is indeed 1/9, peasants like me will just brute force it:

``````julia> z = [rand(1:15, 30) for _ β 1:1_000_000];

julia> sum(count(length(unique(i)) == x && maximum(i) == x for i β z) for x β 11:15)/1_000_000
0.110809
``````

Is this intended behaviour? If so your function is probably fine, I might have written it like that:

``````reduce(vcat, (k->(k[1],i)).(filter(x->x[2]==j,v)) for (i,j) in enumerate(sort(unique(last.(v)))))
``````

I would have expected this to be faster, but for some reason in my benchmarking I see an extra 10 allocations and about 10% worse performance compared to your version so

1 Like

Thank you very much for your elaborate consideration. Yes, `sort(v, by = last)` sorts the tuple list based on the last value and it does rank elements but for my requirement instead of returning the original sorted `v `, I want to modify `v` such that ranking is done based on the positive integers. For instance, if I assume each tuple in `v` contains a student number corresponding to an exam score, (note more than one student can have the same exam score), now instead of ranking students based on their actual exam score, I want to rank them based on the position. i.e, whoever scores the lowest gets position 1, and so on. For example say `v=[(1,15),(2,13),(4,13), (5,6)]` should be `v=[(1,3),(2,2),(4,2),(5,1)]` hope my question is clear. Thanks once again

``````julia> using StatsBase

julia> collect(zip(first.(v),denserank(last.(v))))
4-element Vector{Tuple{Int64, Int64}}:
(1, 3)
(2, 2)
(4, 2)
(5, 1)
``````

For more information about `denserank`, the usual, press `?` for help and type `denserank` will do.

3 Likes

Ok, in this case I would do this:

``````julia> using DataFrames, StatsBase

julia> df = DataFrame(student = first.(v), grade = last.(v))
30Γ2 DataFrame
β Int64    Int64
ββββββΌββββββββββββββββ
1 β       1     11
2 β       2      4
3 β       3     12
4 β       4      8
5 β       5      2
6 β       6      3
7 β       7     14
8 β       8      2
9 β       9      7
10 β      10      2
11 β      11     10
12 β      12     10
13 β      13     10
14 β      14     11
15 β      15      2
16 β      16     15
17 β      17     12
18 β      18      5
19 β      19     15
20 β      20      7
21 β      21     11
22 β      22      8
23 β      23     11
24 β      24     12
25 β      25     13
26 β      26      2
27 β      27      1
28 β      28      1
29 β      29     13
30 β      30      2

julia> df.rank = denserank(df.grade); sort!(df, :rank)
30Γ3 DataFrame
β Int64    Int64  Int64
ββββββΌβββββββββββββββββββββββ
1 β      27      1      1
2 β      28      1      1
3 β       5      2      2
4 β       8      2      2
5 β      10      2      2
6 β      15      2      2
7 β      26      2      2
8 β      30      2      2
9 β       6      3      3
10 β       2      4      4
11 β      18      5      5
12 β       9      7      6
13 β      20      7      6
14 β       4      8      7
15 β      22      8      7
16 β      11     10      8
17 β      12     10      8
18 β      13     10      8
19 β       1     11      9
20 β      14     11      9
21 β      21     11      9
22 β      23     11      9
23 β       3     12     10
24 β      17     12     10
25 β      24     12     10
26 β      25     13     11
27 β      29     13     11
28 β       7     14     12
29 β      16     15     13
30 β      19     15     13
``````

You might of course have a good reason to work with vectors of tuples, but given your data and what youβre doing with it they seem a suboptimal data structure to me.

2 Likes

Thank you very much, I love it.

Thank you very much. I really appreciate your help. I liked your take on dataframe. Thanks again.

If you donβt want to depend on any package

``````l = length(v)
u = unique(last.(v))
map(i->(v[i][1],length(u)-count(>(v[i][2]), u)), 1:l)
``````
1 Like