In the dplyr
version you have all your income and profit stuff in a single mutate
command, applied to a grouped data frame. In your DataFramesMeta.jl version, they are separate commands.
This is important, as @transform(grouped_df, ...)
will return a non-grouped data frame, similar to the keyword argument .groups = "drop"
in dplyr.
Use the keyword argument ungroup
in @transform
julia> df = DataFrame(order = [1, 1, 1, 2, 2, 2], income = [10, 20, 30, 50, 60, 70]);
julia> @chain df begin
groupby(:order)
@transform(:total_income = sum(:income); ungroup = false)
@transform(:income_frac = :income ./ sum(:income))
end
6×4 DataFrame
Row │ order income total_income income_frac
│ Int64 Int64 Int64 Float64
─────┼──────────────────────────────────────────
1 │ 1 10 60 0.166667
2 │ 1 20 60 0.333333
3 │ 1 30 60 0.5
4 │ 2 50 180 0.277778
5 │ 2 60 180 0.333333
6 │ 2 70 180 0.388889
As I’m sure you noticed when writing the code, in DataFramesMeta.jl, you can’t use columns you just created in the same block. As an alternative, you can use the @astable
macro-flag in DataFramesMeta.jl, which allows you to create many new columns in the same scope.
However, there is a downside to this. In the following mutate
call from dplyr
mutate(total_income = sum(wholesale_income),
profit = total - total_income,
profit_count = sum(profit > 0))
the total_income = sum(wholesale_income)
is a scalar that is “spread” across all values in the data frame.
In DataFramesMeta.jl, this kind of “spread-ing” isn’t allowed when multiple columns are returned from a @astable
macro-flag. (Really, this limitations is in DataFrames.jl). So inside an @astable
block you can’t return a scalar and a vector. You would have to do something like this:
@transform df @astable begin
s = sum(:income)
:total_income = fill(s, length(:income))
:income_frac = :income ./ :total_income
end
So your best bet is to use the ungroup = false
keyword argument. Maybe there is something DataFramesMeta.jl can do to make things simpler.