Assume that a dataframe is given as:
df_a = DataFrame(Categ=["Alpa","Beta","Gamma","Omega","Alpa","Beta","Gamma","Omega"],
T1 =rand(8),T2=rand(8),T3 =rand(8),T4 =rand(8),T5 =rand(8),T6 =rand(8),T7 =rand(8),T8 =rand(8),T9 =rand(8),T10 =rand(8),T11 =rand(8),T12 =rand(8))
8×13 DataFrame. Omitted printing of 4 columns
│ Row │ Categ │ T1 │ T2 │ T3 │ T4 │ T5 │ T6 │ T7 │ T8 │
│ │ String │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────┼───────────┼──────────┼───────────┼──────────┼───────────┼─────────────┼───────────┼───────────┤
│ 1 │ Alpa │ 0.673424 │ 0.786929 │ 0.0893057 │ 0.568696 │ 0.145572 │ 0.653375 │ 0.21898 │ 0.361053 │
│ 2 │ Beta │ 0.0538812 │ 0.360114 │ 0.699175 │ 0.323722 │ 0.161717 │ 0.000336349 │ 0.542039 │ 0.848293 │
│ 3 │ Gamma │ 0.333272 │ 0.631906 │ 0.275669 │ 0.957596 │ 0.428401 │ 0.641091 │ 0.256033 │ 0.574052 │
│ 4 │ Omega │ 0.423221 │ 0.346942 │ 0.817672 │ 0.761629 │ 0.861805 │ 0.0104372 │ 0.0955495 │ 0.300919 │
│ 5 │ Alpa │ 0.360478 │ 0.123437 │ 0.0864159 │ 0.486619 │ 0.240639 │ 0.791122 │ 0.157081 │ 0.0159331 │
│ 6 │ Beta │ 0.853104 │ 0.913458 │ 0.135472 │ 0.387773 │ 0.190978 │ 0.542373 │ 0.0581868 │ 0.339313 │
│ 7 │ Gamma │ 0.699581 │ 0.356128 │ 0.477584 │ 0.565736 │ 0.338906 │ 0.625697 │ 0.845379 │ 0.487111 │
│ 8 │ Omega │ 0.144818 │ 0.451992 │ 0.212729 │ 0.990052 │ 0.0205443 │ 0.00236285 │ 0.341749 │ 0.120135 │
I am looking to add rows based on column Categ
. For this, I do the following:
names_original = names(df_a,Not("Categ"))
df_b = combine(groupby(df_a, "Categ"), names(df_a, names_original) .=> sum)
4×13 DataFrame. Omitted printing of 3 columns
│ Row │ Categ │ T1_sum │ T2_sum │ T3_sum │ T4_sum │ T5_sum │ T6_sum │ T7_sum │ T8_sum │ T9_sum │
│ │ String │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┤
│ 1 │ Alpa │ 1.0339 │ 0.910366 │ 0.175722 │ 1.05532 │ 0.38621 │ 1.4445 │ 0.37606 │ 0.376986 │ 0.674029 │
│ 2 │ Beta │ 0.906985 │ 1.27357 │ 0.834647 │ 0.711494 │ 0.352695 │ 0.542709 │ 0.600225 │ 1.18761 │ 1.34625 │
│ 3 │ Gamma │ 1.03285 │ 0.988034 │ 0.753253 │ 1.52333 │ 0.767307 │ 1.26679 │ 1.10141 │ 1.06116 │ 1.13612 │
│ 4 │ Omega │ 0.568039 │ 0.798934 │ 1.0304 │ 1.75168 │ 0.882349 │ 0.0128 │ 0.437299 │ 0.421054 │ 0.937023 │
Then I need to transpose this dataframe
dfTranspose = DataFrame([[names(df_b)]; collect.(eachrow(df_b))], [:column; Symbol.(axes(df_b, 1))])
Row │ column │ 1 │ 2 │ 3 │ 4 │
│ │ Any │ Any │ Any │ Any │ Any │
├─────┼─────────┼──────────┼──────────┼──────────┼──────────┤
│ 1 │ Categ │ Alpa │ Beta │ Gamma │ Omega │
│ 2 │ T1_sum │ 1.0339 │ 0.906985 │ 1.03285 │ 0.568039 │
│ 3 │ T2_sum │ 0.910366 │ 1.27357 │ 0.988034 │ 0.798934 │
│ 4 │ T3_sum │ 0.175722 │ 0.834647 │ 0.753253 │ 1.0304 │
│ 5 │ T4_sum │ 1.05532 │ 0.711494 │ 1.52333 │ 1.75168 │
│ 6 │ T5_sum │ 0.38621 │ 0.352695 │ 0.767307 │ 0.882349 │
│ 7 │ T6_sum │ 1.4445 │ 0.542709 │ 1.26679 │ 0.0128 │
│ 8 │ T7_sum │ 0.37606 │ 0.600225 │ 1.10141 │ 0.437299 │
│ 9 │ T8_sum │ 0.376986 │ 1.18761 │ 1.06116 │ 0.421054 │
│ 10 │ T9_sum │ 0.674029 │ 1.34625 │ 1.13612 │ 0.937023 │
│ 11 │ T10_sum │ 1.18364 │ 0.917098 │ 1.18644 │ 1.40851 │
│ 12 │ T11_sum │ 1.05572 │ 0.476368 │ 1.40192 │ 0.77439 │
│ 13 │ T12_sum │ 0.519706 │ 1.01044 │ 1.11613 │ 1.50852 │
Annoyingly, the first row contains Categ
which should be in headers and suffix “sum” is added to T. I am looking to:
- Produce a dataframe similar to the above but the entries in first row should be moved to column names, i.e. column names should be alpha, beta, gamma, omega.
- Add another column (called Total) to the dataframe that is calculated as:
dfTranspose.Alpha + dfTranspose.Omega - dfTranspose.Beta
Any ideas?