Yes, maybe, but there are several levels of complexity regarding what the function returns for each group:
- a single scalar
- several scalars (e.g. as a tuple,
NamedTupleor a single-rowDataFrame) - a multiple-row
DataFrame(possibly with a varying number of rows depending on the group)
Each of these cases can be either type-stable or not, which makes things even more tricky.
FWIW, Pandas provides three different functions:
aggregateto return a single value for each column and each group (similar to ouraggregate, which is a bit more general)transformto return aDataFramewith the same shape as the original for each groupapplyfor general transformations (similar to ourby)
The advantage of transform over apply is that you know in advance the size of the result, so you can avoid allocating a temporary copy if you can predict the output type. Same for aggregate, which can be even more efficient since it can operate by columns (making inference and specialization easier). The fact that we allow aggregate to return either a scalar or a vector makes it harder to optimize, cf. this PR.
See `stack(vec_of_vecs)` for `vcat(vec_of_vecs...)` · Issue #21672 · JuliaLang/julia · GitHub.