Sum the columns in a dataframe

fnbillimoria · August 17, 2020, 6:10am

This should be a relatively simple question but I could not find a great answer. I have a dataframe like below ( but with way more colums), and I wish to add a new column that is the sum of all the other columns (e.g. a + b + c). We can assume that all of the columns are float64 types. I would like to do this over a large group of columns so specifying each column one by one is not feasible. Many thanks!

df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),

                      b = repeat([2, 1], outer=[4]),

                                     c = randn(8))

nilshg · August 17, 2020, 6:54am

Not at a computer but I think something like sum(eachcol(df)) should work

derekmahar · August 17, 2020, 9:20am

That seems to work:

julia> using DataFrames

julia> df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),

                             b = repeat([2, 1], outer=[4]),

                                            c = randn(8))
8×3 DataFrame
│ Row │ a     │ b     │ c         │
│     │ Int64 │ Int64 │ Float64   │
├─────┼───────┼───────┼───────────┤
│ 1   │ 1     │ 2     │ 2.33499   │
│ 2   │ 2     │ 1     │ 1.08153   │
│ 3   │ 3     │ 2     │ 1.55002   │
│ 4   │ 4     │ 1     │ -1.35953  │
│ 5   │ 1     │ 2     │ -1.87585  │
│ 6   │ 2     │ 1     │ 1.06405   │
│ 7   │ 3     │ 2     │ -0.129446 │
│ 8   │ 4     │ 1     │ 1.98992   │

julia> sum(eachcol(df))
8-element Array{Float64,1}:
 5.334989587238981
 4.081530272951787
 6.550024757198643
 3.640465366532979
 1.1241529384736229
 4.064050523357361
 4.8705544804273035
 6.989923186107735

pdeffebach · August 17, 2020, 1:36pm

Are you coming from Stata, by chance? You can emulate rowtotal using ByRow in transform in DataFrames.

fnbillimoria · August 17, 2020, 11:33pm

I am coming from an R background…

Thank you the eachcol solution works. I tried transform, but probably wansnt getting the syntax right.

MalteMederacke · October 2, 2020, 12:14pm

Doesn’t this sum the rows? Sum of the column sum(eachrow(df)) doesnt work for me. Why not?

bert · June 24, 2021, 2:32pm

sum(eachcol(df)) does indeed sum across, since it essentially does sum([df[!, c] for c in names(df)]). If you want to sum down, you should use sum.(eachcol(df)), which is essentially [sum(df[!, c]) for c in names(df)].

Topic		Replies	Views
Help in sum columns New to Julia question , dataframes , sum	11	1819	March 14, 2022
Sum of columns in julia dataframe and appen it as last element in the respective columns General Usage dataframes , ijulia , sum	3	845	September 6, 2022
Sum() one column in dataframe New to Julia dataframes	4	933	December 1, 2021
Sum rows of DataFrame Data question , dataframes	6	1211	April 26, 2023
Combine two capabilities of DataFrames.eachcol Data	2	629	August 24, 2019

Sum the columns in a dataframe

Related topics