I’m pretty new to Julia and struggle with a data wrangling problem. Say, I’ve got the following dataframe:
using DataFrames df = DataFrame(ID = [1,1,1,2,2,2,3,3], OP = [:X,:X,:Y,:X,:Y,:Y,:X,:Y], COUNT = [5, 10, 2, 7, 2, 0, 1, 2])
What I want to get is for each ID the count according to OP X or Y. What I can do is to use
combine to aggregate the data:
gdf = groupby(df, [:ID, :OP]) result_df = combine(gdf, :COUNT => sum)
This creates a dataframe, in which for each ID I get two rows - row 1 contains the count for X and row 2 the count for Y. Is there an easy way to get the counts for X and Y in two different columns? Instead of
result_df created in the last code chunk I’d like to get the following dataframe.
optimal_DF = DataFrame(ID = [1,2,3], X_COUNT = [15,7,1], Y_COUNT = [2,2,2])
If there is an easy and idiomatic way to do this, I’d like to hear about it.