Hi,
I’m trying to determine unique counts of values in a column per group in a DataFrame.
As an example, given the following:
using DataFrames
julia> df = DataFrame(lab = [repeat(["Lab1"], 4)...; repeat(["Lab2"], 5)...], value = ['a','a','b','a','a','b','c','c','c'])
9×2 DataFrame
│ Row │ lab │ value │
│ │ String │ Char │
├─────┼────────┼───────┤
│ 1 │ Lab1 │ 'a' │
│ 2 │ Lab1 │ 'a' │
│ 3 │ Lab1 │ 'b' │
│ 4 │ Lab1 │ 'a' │
│ 5 │ Lab2 │ 'a' │
│ 6 │ Lab2 │ 'b' │
│ 7 │ Lab2 │ 'c' │
│ 8 │ Lab2 │ 'c' │
│ 9 │ Lab2 │ 'c' │
I would like to get
5×3 DataFrame
│ Row │ lab │ value │ count │
│ │ String │ Char │ Int64 │
├─────┼────────┼───────┼───────┤
│ 1 │ Lab1 │ 'a' │ 3 │
│ 2 │ Lab1 │ 'b' │ 1 │
│ 3 │ Lab2 │ 'a' │ 1 │
│ 4 │ Lab2 │ 'b' │ 1 │
│ 5 │ Lab2 │ 'c' │ 3 │
I’ve gotten as far as
julia> combine(grouped, :value => (vals -> keys(counter(vals))) => :value, :value => (vals -> values(counter(vals))) => :count)
2×3 DataFrame
│ Row │ lab │ value │ count │
│ │ String │ Base.KeySet… │ Base.Val… │
├─────┼────────┼─────────────────┼───────────┤
│ 1 │ Lab1 │ ['a', 'b'] │ [3, 1] │
│ 2 │ Lab2 │ ['a', 'c', 'b'] │ [1, 3, 1] │
However, I would prefer
- not to call
counter
twice - actually split the counts into separate rows
Can someone help?
Thanks!
Kevin