ByRow vs broadcasting performance

Soldalma · August 20, 2022, 4:42pm

I was under the impression that ByRow would have better performance than broadcasting. However, trying an example today I found that it seems to be the other way. It may be due to some specific characteristic of my example. In any case, what is the reason for this?

df1 = DataFrame(A = rand(20), B = Float.(collect(1:20)));
# 9.500 μs (144 allocations: 7.91 KiB)
@btime transform(df1, [:A, :B] => ((x, y) -> x .+ y) => :Sum)
df1 = DataFrame(A = rand(20), B = Float.(collect(1:20)));
# 41.400 μs (358 allocations: 24.41 KiB)
transform(df1, [:A, :B] => ByRow((x, y) -> x + y) => :Sum)

Topic		Replies	Views
Elegant ways to broadcast the same function to each column replacing the original column in DataFrames.jl New to Julia dataframes	9	1086	May 22, 2021
DataFrame transformation is so slow, what am I doing wrong? Performance compilation , dataframes	17	337	May 19, 2024
Performance of DataFrames' subset and ByRow Performance	10	1546	May 5, 2021
Rowwise compuation in `InMemoryDatasets.jl` vs `DataFrames.jl` Performance data , dataframes , inmemorydatasets	2	684	March 23, 2022
Understanding major order performance when broadcasting in column vs row operations Performance question , array , benchmark	9	1002	June 21, 2021

ByRow vs broadcasting performance

Related topics