Sorting a Table / DataFrame

I have a table like this:

* A B C
A 1 2 3
B 2 3 4
C 3 4 5

And I want to sort descending both on columns and rows after the sum something like this:

*  C B A  S
C  5 4 3 12
B  4 3 2  9
A  3 2 1  6
S 12 9 6  *

How can I do that the simplest?

thx.

1 Like

As far as i know DataFrames does not have row names. It indexes rows from 1:n.

3 Likes

I assume you do not want the "*" and "S" columns and do not want the last row in the output. In this case do:

julia> df = DataFrame("A" => 1:3, "B" => 2:4, "C" => 3:5)
3Γ—3 DataFrame
 Row β”‚ A      B      C
     β”‚ Int64  Int64  Int64
─────┼─────────────────────
   1 β”‚     1      2      3
   2 β”‚     2      3      4
   3 β”‚     3      4      5

julia> df[sortperm(sum.(eachrow(df)), rev=true), sortperm(sum.(eachcol(df)), rev=true)]
3Γ—3 DataFrame
 Row β”‚ C      B      A
     β”‚ Int64  Int64  Int64
─────┼─────────────────────
   1 β”‚     5      4      3
   2 β”‚     4      3      2
   3 β”‚     3      2      1
2 Likes
julia>  sort(sort(m,dims=1, by=sum, rev=true),dims=2,rev=true)
3Γ—3 Matrix{Int64}:
 5  4  3
 4  3  2
 3  2  1
using LinearAlgebra
rot180(sort(sort(m,dims=1, by=sum),dims=2))

1 Like

Thank you!

Something like this but I need the column label too and the sum if is possible too.

That’s not the main problem I want to know after double sorting a label or index.

julia> using DataFrames

julia> M=[1 2 3 ; 2 3 4 ; 3 4 5]
3Γ—3 Matrix{Int64}:
 1  2  3
 2  3  4
 3  4  5

julia> r=sum(M,dims=1)
1Γ—3 Matrix{Int64}:
 6  9  12

julia> l=sum(M,dims=2)
3Γ—1 Matrix{Int64}:
  6
  9
 12

I need to sort after the sum and to have the labels.

All is doable, can you confirm that this is the exact way how your input data frame looks like and how your ouput should look? (then I can give you information how to transform input into output)

julia> input = DataFrame("*" => ["A", "B", "C"], "A" => 1:3, "B" => 2:4, "C" => 3:5)
3Γ—4 DataFrame
 Row β”‚ *       A      B      C
     β”‚ String  Int64  Int64  Int64
─────┼─────────────────────────────
   1 β”‚ A           1      2      3
   2 β”‚ B           2      3      4
   3 β”‚ C           3      4      5

julia> output = DataFrame("*" => ["C", "B", "A", "S"], "C" => [5:-1:3; 12], "B" => [4:-1:2; 9], "A" => [3:-1:1; 6], "S" => [12, 9, 6, "*"])
4Γ—5 DataFrame
 Row β”‚ *       C      B      A      S
     β”‚ String  Int64  Int64  Int64  Any
─────┼──────────────────────────────────
   1 β”‚ C           5      4      3  12
   2 β”‚ B           4      3      2  9
   3 β”‚ A           3      2      1  6
   4 β”‚ S          12      9      6  *
1 Like

Yes the my problem has a table of over 1000x1000 with labels on raws and columns and I want to have in the top left corner the values that have the highest sums of absolute values. The initial table is matrix in Julia.

Here are the steps to do (I designed them so that they should be hopefully easy to grasp - I did not want to squeeze too much into one step):

julia> input = DataFrame("*" => ["A", "B", "C"], "A" => 1:3, "B" => 2:4, "C" => 3:5)
3Γ—4 DataFrame
 Row β”‚ *       A      B      C
     β”‚ String  Int64  Int64  Int64
─────┼─────────────────────────────
   1 β”‚ A           1      2      3
   2 β”‚ B           2      3      4
   3 β”‚ C           3      4      5

julia> output = select(input, Not("*"))
3Γ—3 DataFrame
 Row β”‚ A      B      C
     β”‚ Int64  Int64  Int64
─────┼─────────────────────
   1 β”‚     1      2      3
   2 β”‚     2      3      4
   3 β”‚     3      4      5

julia> select!(output, sortperm(sum.(eachcol(output)), rev=true))
3Γ—3 DataFrame
 Row β”‚ C      B      A
     β”‚ Int64  Int64  Int64
─────┼─────────────────────
   1 β”‚     3      2      1
   2 β”‚     4      3      2
   3 β”‚     5      4      3

julia> output.S = sum(eachcol(output))
3-element Vector{Int64}:
  6
  9
 12

julia> insertcols!(output, 1, "*" => input."*")
3Γ—5 DataFrame
 Row β”‚ *       C      B      A      S
     β”‚ String  Int64  Int64  Int64  Int64
─────┼────────────────────────────────────
   1 β”‚ A           3      2      1      6
   2 β”‚ B           4      3      2      9
   3 β”‚ C           5      4      3     12

julia> sort!(output, "S", rev=true)
3Γ—5 DataFrame
 Row β”‚ *       C      B      A      S
     β”‚ String  Int64  Int64  Int64  Int64
─────┼────────────────────────────────────
   1 β”‚ C           5      4      3     12
   2 β”‚ B           4      3      2      9
   3 β”‚ A           3      2      1      6

julia> push!(output, ["S"; sum.(eachcol(output[:, Not(1, end)])); "*"], promote=true)
4Γ—5 DataFrame
 Row β”‚ *       C      B      A      S
     β”‚ String  Int64  Int64  Int64  Any
─────┼──────────────────────────────────
   1 β”‚ C           5      4      3  12
   2 β”‚ B           4      3      2  9
   3 β”‚ A           3      2      1  6
   4 β”‚ S          12      9      6  *

Probably the trickiest part is the last step with sum.(eachcol(output[:, Not(1, end)])) where you want to sum everything except first and last column, which are special and you need to manually assign values to them.

5 Likes

Many thx Bogumil!

I did it like this using matrixes and I receive back the indexes of the rows and columns to aggregate future information.

function toIxes(m)
    mAbs = abs.(m)
    cSum = sum(mAbs; dims=1)
    rSum = sum(mAbs; dims=2)'
    cIx = sortperm(cSum, dims=2, rev=true)
    rIx = sortperm(rSum, dims=2, rev=true)
    (rIx, cIx)
end 

c = rand(5, 6)

toIxes(c)