Sorry kind of a beginners question again. Suppose I have a large GroupDataFrame and I would like to add columns to each SubDataFrame in the Group.
function build(GDF)
for k = eachindex(GDF)
GDF[k].newcol1 = function1(GDF[k].col1)
GDF[k].newcol2 = function2(GDF[k].col2)
end
Is there a better way to go about this? The performance I am seeing is quite slow and I am not sure if this is the bottleneck or if it is an issue with the function themselves.
According to the documentation pre-allocating output should help sometimes but when I tried this out with a DataFrame performance actually slowed.
df3 = DataFrame(X = [1, 2, 3, 4], Y = [0, 1, 2, 4])
julia> @benchmark df3.A = df3.X + df3.Y
BenchmarkTools.Trial: 10000 samples with 960 evaluations.
Range (min β¦ max): 84.158 ns β¦ 1.026 ΞΌs β GC (min β¦ max): 0.00% β¦ 89.01%
Time (median): 89.800 ns β GC (median): 0.00%
Time (mean Β± Ο): 94.999 ns Β± 39.546 ns β GC (mean Β± Ο): 1.77% Β± 4.00%
βββ
ββββββ
ββββββββββββββββββββββ ββ β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
84.2 ns Histogram: log(frequency) by time 137 ns <
Memory estimate: 96 bytes, allocs estimate: 1.
Whereas preallocating a vector before hand yielded
julia> function f1(df)
df.G = Vector{Int}(undef,4)
df.G = df.X+df.Y
end
f1 (generic function with 1 method)
julia> @benchmark f1(df3)
BenchmarkTools.Trial: 10000 samples with 919 evaluations.
Range (min β¦ max): 109.675 ns β¦ 2.028 ΞΌs β GC (min β¦ max): 0.00% β¦ 92.06%
Time (median): 111.353 ns β GC (median): 0.00%
Time (mean Β± Ο): 124.812 ns Β± 108.725 ns β GC (mean Β± Ο): 5.75% Β± 6.19%
ββ
βββββββββββββββββββ β β
βββββββββββββββββββββββββββββββββββββββββββββββββ
ββ
βββ
ββββ
β
ββ
β
110 ns Histogram: log(frequency) by time 178 ns <
Memory estimate: 224 bytes, allocs estimate: 3.
and using transform
seems to be even slower
@benchmark @transform(df3, :Q = :X .+:Y)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 11.625 ΞΌs β¦ 4.450 ms β GC (min β¦ max): 0.00% β¦ 99.20%
Time (median): 12.875 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 14.912 ΞΌs Β± 61.162 ΞΌs β GC (mean Β± Ο): 5.75% Β± 1.40%
ββββββββ
βββββββββββββββ ββββββ ββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββ β
11.6 ΞΌs Histogram: log(frequency) by time 26.1 ΞΌs <
Memory estimate: 9.20 KiB, allocs estimate: 168.
Any tips on what Iβm doing wrong would be greatly appreciated thanks!