In DataFramesMeta
it is possible to evaluate algebraic combinations and calculations involving the column names of a DataFrame
.
Is it possible (in DataFramesMeta or another package that can be placed inside as part of its macros) to simplify expressions to improve the execution time.
As an example I have the following data:
using DataFrames, DataFramesMeta
using Random
twister = MersenneTwister(1234)
ll = Vector{DataFrame}(undef, 10)
for g in 1:10
ll[g] = DataFrame(g = g, x = rand(twister, 10000), y = rand(twister, 10000));
end
dd = reduce(vcat, ll)
gd = @groupby(dd, :g);
The following all deliver the same result but the first is the fastest. Is there a way for Julia to simplify the other two to the first?
result1 = @time @transform(gd, :result = :y)
# 0.006957 seconds (369 allocations: 4.599 MiB)
result2 = @time @transform(gd, :result = :y + :x - :x + exp.(log.(:y)) - :y)
# 0.203798 seconds (237.28 k allocations: 29.093 MiB, 5.66% gc time, 89.07% compilation time)
result3 = @time @transform(gd, :result = :y/3 + :y/3 + :y/3 + :x/2 + :x/2 +:y - (:x +:y))
# 0.456713 seconds (280.93 k allocations: 37.507 MiB, 3.88% gc time, 90.19% compilation time)
and in the case it is possible, can you define simplification rules involving arbitrary functions. So in the below case you could tell Julia that k * ShiftedArrays.lag(x) = ShiftedArrays.lag(kx)
and so it could simplify.
using ShiftedArrays
result4 = @time @transform(gd, :result = ShiftedArrays.lag(:y))
# 0.102200 seconds (73.83 k allocations: 13.625 MiB, 94.17% compilation time)
result5 = @time @transform(gd, :result = ShiftedArrays.lag(:y)/2 + ShiftedArrays.lag(0.5 * :y))
# 0.246605 seconds (259.86 k allocations: 29.362 MiB, 96.41% compilation time)