I have the following performance issue with user defined function in combine(). Assume
df = DataFrame(g = rand(1:100,1000), x = rand(1000))
and
temp_fun(x) = sum(x)
the performance of combine() in the following two scenarios is very different, and I don’t understand how to overcome this
using BenchmarkTools
@btime combine(groupby(df, :g), :x=>sum);
29.429 μs (188 allocations: 41.00 KiB)
@btime combine(groupby(df, :g), :x=>temp_fun);
64.421 μs (1020 allocations: 78.31 KiB)
Any suggestion? thanks.