What’s the difference, or which one should I use?
The two latter options work with transform() if I want to add this new column to the original dataframe.
I don’t know how to include the first one with a transform().
These are very small tables so the performance is affected by factors not related to summation.
In DataFrames.jl 1.3 that will be released soon (it is held back the release of Julia 1.7) the fastest option, especially for wide and large tables will be transform(df, AsTable(:) => ByRow(sum)).
For the time being an easy (i.e. IMO natural for someone knowing how things in Julia Base work), and reasonably fast option is df.sum = sum(eachcol(df)).
Also note that .=> is in this case the same as => the . does not do anything in this situation.
In fact my initial question wasn’t about speed but to know if there are other differences or disadvantages. For example if the returned object is more or less useful (dataframes vs other things) for additional operations.
The first option, not using ByRow, doesn’t produce the expected output if we have missings. I guess we will have similar problems with other functions.
And another question,
How can I run more complex functions inside the ByRow()?
this fails but not because of DataFrames.jl but because of Julia Base and in general it is incorrect as there is no sum in your expression. You have to write:
In general I would recommend to handle functions like x -> sum(v -> v^2, x) not as anonymous but rather predefine them - as using a lot of anonymous functions can lead to not very readable code (it is like a decision whether one should write one long one line expression or rather define variables to store intermediate values even if they are discarded later).