Group, Mutate, Ungroup

Is there a way developed to support the following workflow? I realize this example is trivial but it could be applied to more complex workflows like subgroup moving averages. And while there are ways to run calculations over a DataFrame in a way that does not need the following method sometimes it is easiest (mentally, temporally) to use this sort of brute force approach.

Goal: Number all rows in each group from 1:Number of Rows
Operation 1: Break DataFrame into SubGroups based on Attributes
Operation 2: Add column to each subgroup which numbers it from 1:Number of Rows
Operation 3: Ungroup SubGroups back into single DataFrame

I know Step1 is possible in Julia today. Is there a way to perform operations 2 and 3? I have tried to modify a SubDataFrame and it does not seem possible. Also I can’t find a way to ungroup a grouped dataFrame.

Does this suit your needs?

julia> df = DataFrame(x=[1, 1, 2, 2, 2, 3])
6Γ—1 DataFrames.DataFrame
β”‚ Row β”‚ x β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€
β”‚ 1   β”‚ 1 β”‚
β”‚ 2   β”‚ 1 β”‚
β”‚ 3   β”‚ 2 β”‚
β”‚ 4   β”‚ 2 β”‚
β”‚ 5   β”‚ 2 β”‚
β”‚ 6   β”‚ 3 β”‚

julia> by(df, :x) do sdf
           DataFrame(n=1:size(sdf,1))
       end
6Γ—2 DataFrames.DataFrame
β”‚ Row β”‚ x β”‚ n β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€
β”‚ 1   β”‚ 1 β”‚ 1 β”‚
β”‚ 2   β”‚ 1 β”‚ 2 β”‚
β”‚ 3   β”‚ 2 β”‚ 1 β”‚
β”‚ 4   β”‚ 2 β”‚ 2 β”‚
β”‚ 5   β”‚ 2 β”‚ 3 β”‚
β”‚ 6   β”‚ 3 β”‚ 1 β”‚

The by function is just a combination of groupy (operation 1) and combine (operation 3).

Sweet! I didn’t realize by…do could be used like that. I think because all of the examples did summary statistics I never thought to try it for non-summary methods.

1 Like