Group, Mutate, Ungroup

question
dataframes

#1

Is there a way developed to support the following workflow? I realize this example is trivial but it could be applied to more complex workflows like subgroup moving averages. And while there are ways to run calculations over a DataFrame in a way that does not need the following method sometimes it is easiest (mentally, temporally) to use this sort of brute force approach.

Goal: Number all rows in each group from 1:Number of Rows
Operation 1: Break DataFrame into SubGroups based on Attributes
Operation 2: Add column to each subgroup which numbers it from 1:Number of Rows
Operation 3: Ungroup SubGroups back into single DataFrame

I know Step1 is possible in Julia today. Is there a way to perform operations 2 and 3? I have tried to modify a SubDataFrame and it does not seem possible. Also I can’t find a way to ungroup a grouped dataFrame.


#2

Does this suit your needs?

julia> df = DataFrame(x=[1, 1, 2, 2, 2, 3])
6×1 DataFrames.DataFrame
│ Row │ x │
├─────┼───┤
│ 1   │ 1 │
│ 2   │ 1 │
│ 3   │ 2 │
│ 4   │ 2 │
│ 5   │ 2 │
│ 6   │ 3 │

julia> by(df, :x) do sdf
           DataFrame(n=1:size(sdf,1))
       end
6×2 DataFrames.DataFrame
│ Row │ x │ n │
├─────┼───┼───┤
│ 1   │ 1 │ 1 │
│ 2   │ 1 │ 2 │
│ 3   │ 2 │ 1 │
│ 4   │ 2 │ 2 │
│ 5   │ 2 │ 3 │
│ 6   │ 3 │ 1 │

The by function is just a combination of groupy (operation 1) and combine (operation 3).


#3

Sweet! I didn’t realize by…do could be used like that. I think because all of the examples did summary statistics I never thought to try it for non-summary methods.