Understanding the performance issue in combine() [DataFrames.jl]

DataFrames has fast path implementations for groupby with certain functions like sum - you can see them in the code here

Currently I don’t think there’s a public API way to opt-in for your own functions. If the op can be expressed as a reduction (e.g. Base.add_sum for sum), then you could replicate what DataFrames is doing, but that’s an internal API.

1 Like