Combining a col from each DF group into a single DF

Hi all,

sorry if this obvious, fairly new to Julia. I think there must be a simple way to combine a single col from all DF groups into a single DF. I have this:

gd = groupby(DataFrame(dtype=repeat(["AA","BB","CC"], inner=4), val=collect(1:12)),:dtype)

and want this

      │ AA     BB     CC    
   ─┼─────────────────────
   1 │     1      5      9
   2 │     2      6     10
   3 │     3      7     11
   4 │     4      8     12

currently doing it like this, but I think it there should be a simpler way:

res = DataFrame()
for dt in keys(gd)
    res[!,dt[1]] = gd[(dtype=dt[1],)][!,:val]
end

Thanks!

This looks like ‘stacking’ or ‘pivoting’.

julia> combine(gd, :dtype, :val, eachindex => :row)
12×3 DataFrame
 Row │ dtype   val    row
     │ String  Int64  Int64
─────┼──────────────────────
   1 │ AA          1      1
   2 │ AA          2      2
   3 │ AA          3      3
   4 │ AA          4      4
   5 │ BB          5      1
   6 │ BB          6      2
   7 │ BB          7      3
   8 │ BB          8      4
   9 │ CC          9      1
  10 │ CC         10      2
  11 │ CC         11      3
  12 │ CC         12      4

julia> unstack(ans, :row, :dtype, :val)
4×4 DataFrame
 Row │ row    AA      BB      CC
     │ Int64  Int64?  Int64?  Int64?
─────┼───────────────────────────────
   1 │     1       1       5       9
   2 │     2       2       6      10
   3 │     3       3       7      11
   4 │     4       4       8      12

The way unstacking works is you specify row labels, column labels and values, unstack(df, :rows, :columns, :values), and you get a table with the values at the corresponding positions. In our case, we need to add the target rows from the position within each group (which is what eachindex => :row does), the target columns are from :dtype, and the values from :val.

Rows and columns are ordered by appearance

The column names in :target_column appear in the table in order of appearance, and similarly for :target_rows. I.e., :target_rows does not specify the row index — in fact, we can use any value for the row label:

julia> using DataFramesMeta

julia> @chain(gd,
         @combine(:val, :row = 'v' .+ eachindex(:val)),
         unstack(:row, :dtype, :val)
       )
4×4 DataFrame
 Row │ row   AA      BB      CC
     │ Char  Int64?  Int64?  Int64?
─────┼──────────────────────────────
   1 │ w          1       5       9
   2 │ x          2       6      10
   3 │ y          3       7      11
   4 │ z          4       8      12

1 Like
df=DataFrame(dtype=repeat(["AA","BB","CC"], inner=4), val=collect(1:12))
flatten(unstack(df, :dtype, :val, valuestransform=identity), Cols(:))

if the ultimate goal is to apply an aggregation function to the values of the various groups, you can avoid going through groupby and / or flatten in the following way


unstack(df, :dtype, :val, valuestransform=sum)
unstack(df, :dtype, :val, valuestransform=x->sum(x)/length(x))

valuestransform isn’t in the stable release of DataFrames yet though, right? I found it on the latest/ docs pages but not in the stable/ ones.

I have this version installed.
I’m not sure, but I think it’s as you say

Status `C:\Users\sprmn\.julia\v1.8\Project.toml`
  [a93c6f00] DataFrames v1.4.0 `https://github.com/JuliaData/DataFrames.jl.git#main`

Yes - valuestransform is not released yet, but it will be released soon.