Apologies but I donβt understand your question. βrank each tuple according to the second valueβ to me means (for a length-2 tuple) sort(v, by = last)
.
Your longer expression constructs new tuples, and only returns the same tuples as in the original list around 1/9 of the time:
julia> function compare_sorts(n)
res = 0
for _ β 1:n
v = [(i,rand(1:15)) for i=1:30]
sort1 = vcat([map(k->(k[1],i),filter(x->x[2]==j,v)) for (i,j) in enumerate(sort(unique(last.(v))))]...)
sort2 = sort(v, by = last)
res += sort1 == sort2
end
return res/n
end
compare_sorts (generic function with 1 method)
julia> compare_sorts(10_000)
0.1108
with more samples this converges to 0.1111β¦ so Iβm sure it can be shown analytically that the expected value given your chosen example values of 30 tuples with random integers from 1:15.
The issue is enumerate
which will create different i
and j
values at some point unless the second values of your 30 tuples include all consecutive numbers from 1 up. This being the Julia Discourse, thereβs a good chance someone will be along shortly to show that the probability of this happening is indeed 1/9, peasants like me will just brute force it:
julia> z = [rand(1:15, 30) for _ β 1:1_000_000];
julia> sum(count(length(unique(i)) == x && maximum(i) == x for i β z) for x β 11:15)/1_000_000
0.110809
Is this intended behaviour? If so your function is probably fine, I might have written it like that:
reduce(vcat, (k->(k[1],i)).(filter(x->x[2]==j,v)) for (i,j) in enumerate(sort(unique(last.(v)))))
I would have expected this to be faster, but for some reason in my benchmarking I see an extra 10 allocations and about 10% worse performance compared to your version so