I love Query.jl
’s standalone syntax, but I’m having trouble with finding something comparable to dplyr’s do syntax. Essentially a lot of what I need to do is compute an average/median/whatever of a group and then normalize all values in the group to that value and then return the original dataframe with the normalized-by-group values.
I do something like this currently:
julia> using Query, DataFrames, StatsBase
julia> ex = DataFrame(:a=>[1,2,3,4,5,6,7,8], :b=>repeat([:a, :b], inner=(4)))
8×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Symbol │
├─────┼───────┼────────┤
│ 1 │ 1 │ a │
│ 2 │ 2 │ a │
│ 3 │ 3 │ a │
│ 4 │ 4 │ a │
│ 5 │ 5 │ b │
│ 6 │ 6 │ b │
│ 7 │ 7 │ b │
│ 8 │ 8 │ b │
julia> result = ex |>
@groupby(_.b) |>
@map({group=key(_), avg=mean(_.a)}) |>
DataFrame
2×2 DataFrame
│ Row │ group │ avg │
│ │ Symbol │ Float64 │
├─────┼────────┼─────────┤
│ 1 │ a │ 2.5 │
│ 2 │ b │ 6.5 │
julia> out = join(ex, result, on=[:b=>:group]);
julia> out[:normed] = out[:a] ./ out[:avg];
julia> out
8×4 DataFrame
│ Row │ a │ b │ avg │ normed │
│ │ Int64 │ Symbol │ Float64 │ Float64 │
├─────┼───────┼────────┼─────────┼──────────┤
│ 1 │ 1 │ a │ 2.5 │ 0.4 │
│ 2 │ 2 │ a │ 2.5 │ 0.8 │
│ 3 │ 3 │ a │ 2.5 │ 1.2 │
│ 4 │ 4 │ a │ 2.5 │ 1.6 │
│ 5 │ 5 │ b │ 6.5 │ 0.769231 │
│ 6 │ 6 │ b │ 6.5 │ 0.923077 │
│ 7 │ 7 │ b │ 6.5 │ 1.07692 │
│ 8 │ 8 │ b │ 6.5 │ 1.23077 │
Is there a cleaner way of doing this using Query.jl
’s standalone operators?