Combine rows but keep all columns

Hi all, due to the lack of copycalls kwarg in combine, I am not sure if there is a more efficient way to achieve what I want to do.

The following code groups the example data frame by ID and then sums the values in DataB column together, and should return a single row per group, but with all columns preserved. Is there a better way to do this?

using DataFrames

mydf = DataFrame(
    ID=[:A,:A,:B,:C],
    DataA=[10,10,5,7],
    DataB=[12,63,4,5]
)

vcat([
    let 
        unique(transform(df, :DataB => sum => :DataB, copycols=true))
    end
    for df in groupby(mydf, :ID)
]...)

I would do

combine(groupby(mydf, :ID), :DataA => unique => :DataA, :DataB => sum => :DataB)

Of course, this only works if DataA has the same value for the corresponding ID.

Thank you! Is there some way to concisely specify :DataA => unique => :DataA, etc, for all the other columns? In my actual problem I have around 40 columns, which would be tedious to write out and modify if they change.

combine(groupby(mydf, :ID), 1:ncol(mydf) .=> maximum, :DataB => sum)
3Γ—5 DataFrame
 Row β”‚ ID      ID_maximum  DataA_maximum  DataB_maximum  DataB_sum 
     β”‚ Symbol  Symbol      Int64          Int64          Int64
─────┼─────────────────────────────────────────────────────────────
   1 β”‚ A       A                      10             63         75
   2 β”‚ B       B                       5              4          4
   3 β”‚ C       C                       7              5          5

As for @Rudi79’s suggestion, this requires all the values of each column to be constant for each valie of ID.

combine(groupby(mydf, [:ID,:DataA]), :DataB => sum => :DataB)
combine(groupby(mydf, Not(:DataB)), :DataB => sum => :DataB)
2 Likes
combine(groupby(mydf, propertynames(mydf)[1:ncol(mydf)-1]), :DataB => sum => :DataB)
3Γ—3 DataFrame
 Row β”‚ ID      DataA  DataB 
     β”‚ Symbol  Int64  Int64
─────┼──────────────────────
   1 β”‚ A          10     75
   2 β”‚ B           5      4
   3 β”‚ C           7      5

Saves typing column names, but you need the β€œconstant variables” all together in a block of columns, and the ones you want to apply the function to in a separate block.

combine(groupby(mydf, :ID), Not([:ID,:DataB]).=>first.=>identity, :DataB => sum => :DataB)

combine(groupby(mydf, :ID), Not([:ID,:DataB]).=>Ref .=>identity, :DataB => sum => :DataB)
1 Like

Thanks. I appreciate all the solutions but this one is extremely elegant.

Another way to deal with possible non-uniqueness in groups is to error (which might be a good thing to catch such cases early). This can be done with:

combine(groupby(mydf, :ID), Not([:ID,:DataB]).=>only ∘ Set, :DataB => sum => :DataB)

(note using composition operator is not necessary, an anonymous function could do the same).

2 Likes

Cool, yes that’s a good idea. For some reason it never dawned on me to pass multiple functions to modify the columns.