DataFrame groupby-aggregate strategy

@bkamins also gave a great solution on Slack which I’ll add here so it doesn’t get swallowed by the Slack memory hole (as I’m sure I’ll be looking for this at some point in the near future):

julia> df = DataFrame(a = [8, 2, 3, 1, 9, 3], b = [11, 12, 13, 14, 15, 16], c = ['a', 'a', 'a', 'b', 'b', 'b'])
6×3 DataFrame
│ Row │ a     │ b     │ c    │
│     │ Int64 │ Int64 │ Char │
├─────┼───────┼───────┼──────┤
│ 1   │ 8     │ 11    │ 'a'  │
│ 2   │ 2     │ 12    │ 'a'  │
│ 3   │ 3     │ 13    │ 'a'  │
│ 4   │ 1     │ 14    │ 'b'  │
│ 5   │ 9     │ 15    │ 'b'  │
│ 6   │ 3     │ 16    │ 'b'  │

julia> combine(groupby(df, :c), :a => maximum => :a, [:a, :b] => ((a,b) -> b[argmax(a)])  => :b)
2×3 DataFrame
│ Row │ c    │ a     │ b     │
│     │ Char │ Int64 │ Int64 │
├─────┼──────┼───────┼───────┤
│ 1   │ 'a'  │ 8     │ 11    │
│ 2   │ 'b'  │ 9     │ 15    │

so by passing [:a, :b] to the combine call we can create a two argument anonymous function

8 Likes