Reduce DataFrame by unique value

Hi all,

I have a large DataFrame with a column of values and a unique id. I would like to create a new DataFrame of the unique values in one column, and a vector of the corresponding ids in another column. Here is an example:

DataFrame

3×2 DataFrame
 Row │ id     value 
     │ Int64  Int64 
─────┼──────────────
   1 │     2      3
   2 │     4      3
   3 │     5      1

New DataFrame

 Row │ id      value 
     │ Array…  Int64 
─────┼───────────────
   1 │ [2, 4]      3
   2 │ [5]         1

One approach might be to group the DataFrame by value and extract the ids for each group using combine. However, I am not quite sure how to do that. Any guidance would be appreciated.

MWE

using DataFrames

df = DataFrame(id = [2,4,5], value = [3,3,1])

groups = groupby(df, :value)

# how do I extract the ids?

df_new = combine(groups, )
1 Like
julia> using DataFrames

julia> df = DataFrame(id  = [2, 4, 5], value = [3, 3, 1])
3×2 DataFrame
 Row │ id     value
     │ Int64  Int64
─────┼──────────────
   1 │     2      3
   2 │     4      3
   3 │     5      1

julia> combine(groupby(df, :value), :id => Ref)
2×2 DataFrame
 Row │ value  id_Ref
     │ Int64  SubArray…
─────┼──────────────────
   1 │     1  [5]
   2 │     3  [2, 4]
1 Like

Thank you! This helps me out so much. I doubt I would have eventually settled on a solution so simple.

1 Like

this is perhaps more intuitive

combine(groupby(df,:value), :id=>x->[x])
2 Likes

or
combine(groupby(df, :value), :id => fill)
(which kind-of reads nicely - at least for me, as we ask to fill the cell with values)

1 Like

Thank you all for the alternative solutions. I think they are all good. I do agree that :id => fill is very readable.

1 Like