Sum the columns in a dataframe

This should be a relatively simple question but I could not find a great answer. I have a dataframe like below ( but with way more colums), and I wish to add a new column that is the sum of all the other columns (e.g. a + b + c). We can assume that all of the columns are float64 types. I would like to do this over a large group of columns so specifying each column one by one is not feasible. Many thanks!

df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),

                      b = repeat([2, 1], outer=[4]),

                                     c = randn(8))

Not at a computer but I think something like sum(eachcol(df)) should work

2 Likes

That seems to work:

julia> using DataFrames

julia> df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),

                             b = repeat([2, 1], outer=[4]),

                                            c = randn(8))
8Γ—3 DataFrame
β”‚ Row β”‚ a     β”‚ b     β”‚ c         β”‚
β”‚     β”‚ Int64 β”‚ Int64 β”‚ Float64   β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1     β”‚ 2     β”‚ 2.33499   β”‚
β”‚ 2   β”‚ 2     β”‚ 1     β”‚ 1.08153   β”‚
β”‚ 3   β”‚ 3     β”‚ 2     β”‚ 1.55002   β”‚
β”‚ 4   β”‚ 4     β”‚ 1     β”‚ -1.35953  β”‚
β”‚ 5   β”‚ 1     β”‚ 2     β”‚ -1.87585  β”‚
β”‚ 6   β”‚ 2     β”‚ 1     β”‚ 1.06405   β”‚
β”‚ 7   β”‚ 3     β”‚ 2     β”‚ -0.129446 β”‚
β”‚ 8   β”‚ 4     β”‚ 1     β”‚ 1.98992   β”‚

julia> sum(eachcol(df))
8-element Array{Float64,1}:
 5.334989587238981
 4.081530272951787
 6.550024757198643
 3.640465366532979
 1.1241529384736229
 4.064050523357361
 4.8705544804273035
 6.989923186107735
2 Likes

Are you coming from Stata, by chance? You can emulate rowtotal using ByRow in transform in DataFrames.

1 Like

I am coming from an R background…

Thank you the eachcol solution works. I tried transform, but probably wansnt getting the syntax right.

Doesn’t this sum the rows? Sum of the column sum(eachrow(df)) doesnt work for me. Why not?

1 Like

sum(eachcol(df)) does indeed sum across, since it essentially does sum([df[!, c] for c in names(df)]). If you want to sum down, you should use sum.(eachcol(df)), which is essentially [sum(df[!, c]) for c in names(df)].

1 Like