What is the most idiomatic way to sum up all rows of a DataFrame into a row of totals? Can you give an example where the DataFrame contains numeric and non-numeric columns?
Easiest way (when there are non-numeric columns) is
julia> df = DataFrame(a = [1, 2, 3], b = [5, 6, 7], c = ["x", "y", "z"]);
julia> df.rowsum = sum.(eachrow(df[:, names(df, Real)]))
3-element Vector{Int64}:
6
8
10
In the mini-language there is
julia> df = DataFrame(a = [1, 2, 3], b = [5, 6, 7], c = ["x", "y", "z"]);
julia> transform(df, AsTable(names(df, Real)) => ByRow(sum) => :rowsum)
3Γ4 DataFrame
Row β a b c rowsum
β Int64 Int64 String Int64
ββββββΌββββββββββββββββββββββββββββββ
1 β 1 5 x 6
2 β 2 6 y 8
3 β 3 7 z 10
But be warned, the βmini-languageβ version will make some NamedTuple
s under the hood, and if you have 1000+ columns this may impact performance.
With DataFramesMeta there is
julia> @rtransform df :rowtotal = sum(AsTable(names(df, Real)))
3Γ4 DataFrame
Row β a b c rowtotal
β Int64 Int64 String Int64
ββββββΌββββββββββββββββββββββββββββββββ
1 β 1 5 x 6
2 β 2 6 y 8
3 β 3 7 z 10
I think I should provide an example of expected result. I am looking for the sum of rows to produce a final row with the totals, like we do in a spreadsheet:
julia> df = DataFrame(a=[1,2,3], b=[1,1,1], c=["a","b","c"])
# run magic function
julia> magic(df)
| a | b |
---------
| 6 | 3 |
Oh sorry! I thought you meant the opposite
julia> combine(df, names(df, Real) .=> sum)
1Γ2 DataFrame
Row β a_sum b_sum
β Int64 Int64
ββββββΌββββββββββββββ
1 β 6 3
3 Likes
For ByRow(sum)
it will not create a NamedTuple
and it should scale.
Ah correct! I forgot about that optimization.