This is a really simple package (my first macro package) that does only one thing. It exports the macro
@by which gives you the combination of filtering and split-apply-combine approach of R’s data.table (or something approximating that at least), with
by doing the work behind the scenes.
I’ve missed having this concise syntax for a while, as it ticks the most common boxes for what I do with DataFrames and needs no redundant column or dataframe variable names, or complex signatures like
newcol = (:columnA, :columnB) => x -> x.columnA .+ x.columnB.
The package is waiting for inclusion in the GeneralRegistry right now, so until then you need to install it via:
Here are the docs: https://jkrumbiegel.github.io/FilteredGroupbyMacro.jl/dev/
But here’s also already a short example from the README:
using RDatasets using FilteredGroupbyMacro using StatsBase diamonds = dataset("ggplot2", "diamonds") # filter by Price and Carat # then group by Cut # finally compute new columns with keyword names @by diamonds[(:Price .> 3000) .& (:Carat .> 0.3), :Cut, MeanPricePerCarat = mean(:Price) / mean(:Carat), MostFreqColor = sort(collect(countmap(:Color)), by = last)[end]]
Compare this to the default DataFrames syntax:
by(diamonds[(diamonds.Price .> 3000) .& (diamonds.Carat .> 0.3), :], :Cut, MeanPricePerCarat = (:Price, :Carat) => x -> mean(x.Price) / mean(x.Carat), MostFreqColor = :Color => x -> sort(collect(countmap(x)), by = last)[end])
You can also use
:= assignment syntax to join the groupby result with the filtered table:
using FilteredGroupbyMacro using DataFrames df = DataFrame(a = repeat(1:3, 3), b = repeat('a':'c', 3)) # the result of this will be df with a new column sum_a # that contains the same sum_a for every row in each group based on :b @by df[!, :b, sum_a := sum(:a)]