transform! operator appears to result in large number of allocation as large slowdown compared to the non-grouped counterpart.
using DataFrames using StatsBase: sample using BenchmarkTools df1 = DataFrame(rand(1_000_000, 100), :auto) df1[:, :grp] .= sample(Int.(1:100), 1_000_000) dfg1 = groupby(df1, ["grp"]) function test1(df) transform!(df, "x9" => ((x) -> x .^ 2) => "x9B") end
For regular DataFrame:
julia> @btime test1($df1); 1.469 ms (556 allocations: 7.66 MiB)
julia> @btime test1($dfg1); 388.991 ms (9003310 allocations: 311.65 MiB)
As it can be seen, the performance is actually quite bad on the GroupedDataFrame (250X times slower, 40X the allocation size), although my expectation would have been for a relatively modest overhead from operating on the 100 groups. Did I wrongly used
transform! or is there a real performance issue?
The above was run on