Hello everyone,
I’m pretty new to Julia and struggle with a data wrangling problem. Say, I’ve got the following dataframe:
using DataFrames
df = DataFrame(ID = [1,1,1,2,2,2,3,3],
OP = [:X,:X,:Y,:X,:Y,:Y,:X,:Y],
COUNT = [5, 10, 2, 7, 2, 0, 1, 2])
What I want to get is for each ID the count according to OP X or Y. What I can do is to use groupby
and combine
to aggregate the data:
gdf = groupby(df, [:ID, :OP])
result_df = combine(gdf, :COUNT => sum)
This creates a dataframe, in which for each ID I get two rows - row 1 contains the count for X and row 2 the count for Y. Is there an easy way to get the counts for X and Y in two different columns? Instead of result_df
created in the last code chunk I’d like to get the following dataframe.
optimal_DF = DataFrame(ID = [1,2,3],
X_COUNT = [15,7,1],
Y_COUNT = [2,2,2])
If there is an easy and idiomatic way to do this, I’d like to hear about it.
Regards,
Thomas