Help in sum columns

Hi everybody! I’m new in Julia and need a simple help.

I want to sum some columns of my dataframe and input that sum in a new column, but i’m trying with this and are not working.

colunasSum = [:CancIncorreto,:DifColetado,:FaltaCancelado,:FaltaPersonalizado,:QuebraCaixa] 
transform(df_pronto, colunasSum .=> sum => :Valor_desconto)

ArgumentError: duplicate output column name: :Valor_desconto

Stacktrace:
  [1] select_transform!(::Base.RefValue{Any}, df::DataFrame, newdf::DataFrame, transformed_cols::Set{Symbol}, copycols::Bool, allow_resizing_newdf::Base.RefValue{Bool}, column_to_copy::BitVector)
    @ DataFrames C:\Users\Pedro Henrique\.julia\packages\DataFrames\MA4YO\src\abstractdataframe\selection.jl:821
  [2] _manipulate(df::DataFrame, normalized_cs::Vector{Any}, copycols::Bool, keeprows::Bool)
    @ DataFrames C:\Users\Pedro Henrique\.julia\packages\DataFrames\MA4YO\src\abstractdataframe\selection.jl:1621
  [3] manipulate(::DataFrame, ::Any, ::Vararg{Any}; copycols::Bool, keeprows::Bool, renamecols::Bool)
    @ DataFrames C:\Users\Pedro Henrique\.julia\packages\DataFrames\MA4YO\src\abstractdataframe\selection.jl:1541

I need to sum the values of that columns to create a new one with the sum of them.

How can i accomplish that?

Thx!

Do you want to sum rows or do you want a single number?

Here are the ways to produce both:

julia> df = DataFrame(rand(5, 4), :auto)
5×4 DataFrame
 Row │ x1         x2        x3        x4
     │ Float64    Float64   Float64   Float64
─────┼─────────────────────────────────────────
   1 │ 0.428323   0.757593  0.31246   0.640604
   2 │ 0.633038   0.169417  0.528082  0.669834
   3 │ 0.0376092  0.786284  0.399867  0.720137
   4 │ 0.673165   0.66684   0.437688  0.35086
   5 │ 0.614091   0.682412  0.511448  0.495272

julia> transform(df, [:x1, :x2, :x3, :x4] => (+) => :sum_of_rows)
5×5 DataFrame
 Row │ x1         x2        x3        x4        sum_of_rows
     │ Float64    Float64   Float64   Float64   Float64
─────┼──────────────────────────────────────────────────────
   1 │ 0.428323   0.757593  0.31246   0.640604      2.13898
   2 │ 0.633038   0.169417  0.528082  0.669834      2.00037
   3 │ 0.0376092  0.786284  0.399867  0.720137      1.9439
   4 │ 0.673165   0.66684   0.437688  0.35086       2.12855
   5 │ 0.614091   0.682412  0.511448  0.495272      2.30322

julia> transform(df, [:x1, :x2, :x3, :x4] => ((v...) -> sum(+(v...))) => :total_sum)
5×5 DataFrame
 Row │ x1         x2        x3        x4        total_sum
     │ Float64    Float64   Float64   Float64   Float64
─────┼────────────────────────────────────────────────────
   1 │ 0.428323   0.757593  0.31246   0.640604     10.515
   2 │ 0.633038   0.169417  0.528082  0.669834     10.515
   3 │ 0.0376092  0.786284  0.399867  0.720137     10.515
   4 │ 0.673165   0.66684   0.437688  0.35086      10.515
   5 │ 0.614091   0.682412  0.511448  0.495272     10.515

However, since you are struggling maybe using DataFramesMeta.jl will be easier for you:

julia> using DataFramesMeta

julia> @transform(df, :sum_of_rows = :x1 + :x2 + :x3 + :x4)
5×5 DataFrame
 Row │ x1         x2        x3        x4        sum_of_rows
     │ Float64    Float64   Float64   Float64   Float64
─────┼──────────────────────────────────────────────────────
   1 │ 0.428323   0.757593  0.31246   0.640604      2.13898
   2 │ 0.633038   0.169417  0.528082  0.669834      2.00037
   3 │ 0.0376092  0.786284  0.399867  0.720137      1.9439
   4 │ 0.673165   0.66684   0.437688  0.35086       2.12855
   5 │ 0.614091   0.682412  0.511448  0.495272      2.30322

julia> @transform(df, :total_sum = sum(:x1 + :x2 + :x3 + :x4))
5×5 DataFrame
 Row │ x1         x2        x3        x4        total_sum
     │ Float64    Float64   Float64   Float64   Float64
─────┼────────────────────────────────────────────────────
   1 │ 0.428323   0.757593  0.31246   0.640604     10.515
   2 │ 0.633038   0.169417  0.528082  0.669834     10.515
   3 │ 0.0376092  0.786284  0.399867  0.720137     10.515
   4 │ 0.673165   0.66684   0.437688  0.35086      10.515
   5 │ 0.614091   0.682412  0.511448  0.495272     10.515
2 Likes

It seems to me that your problem is that you are broadcasting the first => (i.e., you put a dot before it). Try @bkmins’s answers while taking care to not do it again.

1 Like

Nice!

Thx for the help and the DataFramesMeta.jl tip.

I have only one more question, what are these “v” and the “…” that you used here:

transform(df, [:x1, :x2, :x3, :x4] => ((v...) -> sum(+(v...))) => :total_sum)

For now, thx @Henrique_Becker and @bkamins

Copyed from an old answer of mine:

I think the manual sections you want are:

  1. https://docs.julialang.org/en/v1/manual/faq/#The-two-uses-of-the-…-operator:-slurping-and-splatting – FAQ section that confirms many beginners find ... confusing.
  2. Keyword arguments – The manual section on keyword arguments.
1 Like

I tried without the broadcasting and the error changed to:

colunasSum = [:CancIncorreto,:DifColetado,:FaltaCancelado,:FaltaPersonalizado,:QuebraCaixa] 
transform(df_pronto, colunasSum => sum => :Valor_desconto2)

MethodError: no method matching sum(::Vector{Float64}, ::Vector{Float64}, ::Vector{Float64}, ::Vector{Float64}, ::Vector{Float64})
Closest candidates are:
  sum(::Any, ::AbstractArray; dims, kw...) at C:\Users\Pedro Henrique\AppData\Local\Programs\Julia-1.7.2\share\julia\base\reducedim.jl:890
  sum(::Any, ::Any; kw...) at C:\Users\Pedro Henrique\AppData\Local\Programs\Julia-1.7.2\share\julia\base\reduce.jl:503
  sum(::AbstractArray; dims, kw...) at C:\Users\Pedro Henrique\AppData\Local\Programs\Julia-1.7.2\share\julia\base\reducedim.jl:889

i’m still learning where and how to use the broadcasting.

Yes, the problem here is that you do not want sum, you want +.

What you are doing is equivalent to:

julia> sum([1, 2], [3, 4], [5, 6])
ERROR: MethodError: no method matching sum(::Array{Int64,1}, ::Array{Int64,1}, ::Array{Int64,1})
Closest candidates are:
  sum(::Any, ::AbstractArray; dims) at reducedim.jl:723
  sum(::Any, ::Any) at reduce.jl:494
  sum(::AbstractArray; dims) at reducedim.jl:722
  ...
Stacktrace:
 [1] top-level scope at REPL[1]:1

But what you want is:

julia> +([1, 2], [3, 4], [5, 6])
2-element Array{Int64,1}:
  9
 12
1 Like

Wow! Nice!

Now i understand the difference!

Thx for that explanation and the manuals, i will read them.

If you want you can use sum but then you need to pass one argument to sum not multiple arguments, just like @Henrique_Becker explained, so the codes would be:

transform(df, AsTable([:x1, :x2, :x3, :x4]) => sum => :sum_of_rows)

and

transform(df, AsTable([:x1, :x2, :x3, :x4]) => sum∘sum => :total_sum)

respectively.

But initially I did not want to introduce another level of complexity to our discussion.

And now I see that I could avoid splatting and slurping earlier :smile: :

transform(df, [:x1, :x2, :x3, :x4] => sum∘(+) => :total_sum)
1 Like

I am not a user of dataframes, but reading you i was wandering about allocations. Does this allocate a lot or does the transform function do some clever tricks ?

This allocates just as much as sum(x1 + x2 + x3 + x4) would.
This cannot be further optimized because + is not the same as sum (which uses add_sum).

However some transformations are optimized, especially for very wide tables. You can find their list here.

1 Like