Summing dataFrame by index over rows

Hi all,

I am trying to sum a dataframe by group ‘x1’. This works as follows:

numcols = columns to be summed
summed_df = combine(DataFrames.groupby(df, :x1), numcols .=> sum .=> numcols)

However, I am trying to make 2 adjustments and just cannot figure it out:

  1. I am trying to add 1 for each sum. Thus I have tried something like this:
    summed_df = combine(DataFrames.groupby(df, :x1), numcols .=> x → sum(x) + 1 .=> numcols, ungroup = true)

But this does not work? Why not? the function should still work?

  1. By summing like this, the rows of the dataframe collapse. Is there any way to efficiently restore the original size of the dataframes?

Right now, I solve this by merging the collapsed version to the original version. Merge(original, collapsed, 1-to-many merge). I tried to add the option ungroup = true but it does not do anything.

Many thanks!

To keep the rows, use transform instead of combine. To avoid the error, put parentheses around your anonymous function (x → sum(x) + 1). Otherwise .=> numcols becomes part of the function.

2 Likes

Thanks a ton! That makes a lot of sense.