How to `combine` row vectors

alex-s-gardner · December 18, 2024, 12:45am

Hi All,

Just wondering how I can apply a function to rows containing vectors when doing a groupby → combine:

using DataFrames
df = DataFrame(group = ["A", "A", "B", "B"], v1 = [ [1, 2], [3, 4], [5, 6], [7, 8] ], v2 = [ [1, 2], [3, 4], [5, 6], [7, 8] ])
gdf = groupby(df, :group)
DataFrames.combine(gdf, 2:3 .=> sum)

returns

4×3 DataFrame
 Row │ group   v1_sum  v2_sum 
     │ String  Int64   Int64  
─────┼────────────────────────
   1 │ A            4       4
   2 │ A            6       6
   3 │ B           12      12
   4 │ B           14      14

but I would like two rows, with column 2 and 3 containing vectors of length 2

Dan · December 18, 2024, 7:59am

The following works:

julia> DataFrames.combine(gdf, 2:3 .=> Ref∘sum)
2×3 DataFrame
 Row │ group   v1_Ref_sum  v2_Ref_sum 
     │ String  Array…      Array…     
─────┼────────────────────────────────
   1 │ A       [4, 6]      [4, 6]
   2 │ B       [12, 14]    [12, 14]

But there does seem to be some conceptual problem.

rocco_sprmnt21 · December 18, 2024, 3:14pm

If it’s easier for you to follow, the following form also produces what you ask for.

combine(g->(;v1=[sum(g.v1)], v2=[sum(g.v2)]),gdf)

PS
I, like Dan, have some difficulty imagining the logic behind the request

PPS
I don’t know if in this case some variant of the unstack function (that accepts multiple columns) can get the result in one pass.

rocco_sprmnt21 · December 18, 2024, 4:49pm

another way: split-(apply)combine-

dfspl=select(df, :group,[:v1,:v2] .=>ByRow(x->(x...,)).=>x->x.*string.([1,2]) )

cdfspl=combine(groupby(dfspl,:group), Cols(2:5).=>sum .=>identity)

jcdf=select(cdfspl,:group, [Cols(2:3),Cols(4:5)].=>ByRow((x...)->Base.vect(x)).=>x->x[1][1:end-1])

alex-s-gardner · December 18, 2024, 4:58pm

The data frame that I’m working with has columns that store time series vectors. My goal is simply to groupby and sum retaining the original dimensionality of the time series. Is that sensible?

I would have never thought of
Ref ∘ sum

rocco_sprmnt21 · December 18, 2024, 9:24pm

julia> df = DataFrame(group = ["A", "A", "B", "B"], v1 = [ [1, 2], [3, 4], [5, 6], [7, 8] ], v2 = [ [11, 21], [31, 41], [51, 61], [71, 81] ])   
4×3 DataFrame
 Row │ group   v1      v2       
     │ String  Array…  Array…
─────┼──────────────────────────
   1 │ A       [1, 2]  [11, 21]
   2 │ A       [3, 4]  [31, 41]
   3 │ B       [5, 6]  [51, 61]
   4 │ B       [7, 8]  [71, 81]

julia> sdf=stack(df,[:v1,:v2])
8×3 DataFrame
 Row │ group   variable  value    
     │ String  String    Array…
─────┼────────────────────────────
   1 │ A       v1        [1, 2]
   2 │ A       v1        [3, 4]
   3 │ B       v1        [5, 6]
   4 │ B       v1        [7, 8]
   5 │ A       v2        [11, 21]
   6 │ A       v2        [31, 41]
   7 │ B       v2        [51, 61]
   8 │ B       v2        [71, 81]

julia> unstack(sdf,:group,:variable,:value, combine=sum)
2×3 DataFrame
 Row │ group   v1        v2
     │ String  Array…?   Array…?
─────┼──────────────────────────────
   1 │ A       [4, 6]    [42, 62]
   2 │ B       [12, 14]  [122, 142]

the inverse operation

flatten(unstack(sdf,:group,:variable,:value, combine=copy),[:v1,:v2])

Topic		Replies	Views
Groupby and aggregate a dataframe with custom function that return a vector New to Julia dataframes	8	1810	October 18, 2021
Dataframes: Split combined result to different columns General Usage dataframes	3	297	December 13, 2021
Using DataFrames `combine` is there a way to programmatically pass multiple functions to apply to the same same column? Data	9	793	January 20, 2023
Combining a col from each DF group into a single DF New to Julia question , dataframes	5	278	August 25, 2022
Summing dataFrame by index over rows New to Julia	2	193	January 19, 2023

How to `combine` row vectors

Related topics