# Sum the columns in a dataframe

This should be a relatively simple question but I could not find a great answer. I have a dataframe like below ( but with way more colums), and I wish to add a new column that is the sum of all the other columns (e.g. a + b + c). We can assume that all of the columns are float64 types. I would like to do this over a large group of columns so specifying each column one by one is not feasible. Many thanks!

``````df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),

b = repeat([2, 1], outer=[4]),

c = randn(8))
``````

Not at a computer but I think something like `sum(eachcol(df))` should work

2 Likes

That seems to work:

``````julia> using DataFrames

julia> df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),

b = repeat([2, 1], outer=[4]),

c = randn(8))
8Γ3 DataFrame
β Row β a     β b     β c         β
β     β Int64 β Int64 β Float64   β
βββββββΌββββββββΌββββββββΌββββββββββββ€
β 1   β 1     β 2     β 2.33499   β
β 2   β 2     β 1     β 1.08153   β
β 3   β 3     β 2     β 1.55002   β
β 4   β 4     β 1     β -1.35953  β
β 5   β 1     β 2     β -1.87585  β
β 6   β 2     β 1     β 1.06405   β
β 7   β 3     β 2     β -0.129446 β
β 8   β 4     β 1     β 1.98992   β

julia> sum(eachcol(df))
8-element Array{Float64,1}:
5.334989587238981
4.081530272951787
6.550024757198643
3.640465366532979
1.1241529384736229
4.064050523357361
4.8705544804273035
6.989923186107735
``````
2 Likes

Are you coming from Stata, by chance? You can emulate `rowtotal` using `ByRow` in `transform` in DataFrames.

1 Like

I am coming from an R backgroundβ¦

Thank you the eachcol solution works. I tried transform, but probably wansnt getting the syntax right.

Doesnβt this sum the rows? Sum of the column sum(eachrow(df)) doesnt work for me. Why not?

1 Like

`sum(eachcol(df))` does indeed sum across, since it essentially does `sum([df[!, c] for c in names(df)])`. If you want to sum down, you should use `sum.(eachcol(df))`, which is essentially `[sum(df[!, c]) for c in names(df)]`.

1 Like