Using R data.table I’m used to saving the result of a group by aggregation in each row of each group, e.g.:
> d = data.table(a=c(1,2,3,4), b=c(1,1,2,2))
> d
a b
1: 1 1
2: 2 1
3: 3 2
4: 4 2
> d[,s:=sum(a), b]
> d
a b s
1: 1 1 3
2: 2 1 3
3: 3 2 7
4: 4 2 7
The last command groups by the column b, sums the values in a and writes the result in each row of the groups from b.
Using DataFrames.jl, I’ve currently always been doing this:
julia> d = DataFrame(a=[1,2,3,4], b=[1,1,2,2])
4×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 1 │
│ 3 │ 3 │ 2 │
│ 4 │ 4 │ 2 │
julia> join(by(d, :b, g -> sum(g[:, :a])), d, on=:b)
4×3 DataFrame
│ Row │ b │ x1 │ a │
│ │ Int64 │ Int64 │ Int64 │
├─────┼───────┼───────┼───────┤
│ 1 │ 1 │ 3 │ 1 │
│ 2 │ 1 │ 3 │ 2 │
│ 3 │ 2 │ 7 │ 3 │
│ 4 │ 2 │ 7 │ 4 │
This seems to me to be a bit complicated to write, hard to read and probably inefficient, due to the unnecessary join.
Is there a better way?