Hi everybody! I’m new in Julia and need a simple help.
I want to sum some columns of my dataframe and input that sum in a new column, but i’m trying with this and are not working.
colunasSum = [:CancIncorreto,:DifColetado,:FaltaCancelado,:FaltaPersonalizado,:QuebraCaixa]
transform(df_pronto, colunasSum .=> sum => :Valor_desconto)
ArgumentError: duplicate output column name: :Valor_desconto
Stacktrace:
[1] select_transform!(::Base.RefValue{Any}, df::DataFrame, newdf::DataFrame, transformed_cols::Set{Symbol}, copycols::Bool, allow_resizing_newdf::Base.RefValue{Bool}, column_to_copy::BitVector)
@ DataFrames C:\Users\Pedro Henrique\.julia\packages\DataFrames\MA4YO\src\abstractdataframe\selection.jl:821
[2] _manipulate(df::DataFrame, normalized_cs::Vector{Any}, copycols::Bool, keeprows::Bool)
@ DataFrames C:\Users\Pedro Henrique\.julia\packages\DataFrames\MA4YO\src\abstractdataframe\selection.jl:1621
[3] manipulate(::DataFrame, ::Any, ::Vararg{Any}; copycols::Bool, keeprows::Bool, renamecols::Bool)
@ DataFrames C:\Users\Pedro Henrique\.julia\packages\DataFrames\MA4YO\src\abstractdataframe\selection.jl:1541
I need to sum the values of that columns to create a new one with the sum of them.
How can i accomplish that?
Thx!
Do you want to sum rows or do you want a single number?
Here are the ways to produce both:
julia> df = DataFrame(rand(5, 4), :auto)
5×4 DataFrame
Row │ x1 x2 x3 x4
│ Float64 Float64 Float64 Float64
─────┼─────────────────────────────────────────
1 │ 0.428323 0.757593 0.31246 0.640604
2 │ 0.633038 0.169417 0.528082 0.669834
3 │ 0.0376092 0.786284 0.399867 0.720137
4 │ 0.673165 0.66684 0.437688 0.35086
5 │ 0.614091 0.682412 0.511448 0.495272
julia> transform(df, [:x1, :x2, :x3, :x4] => (+) => :sum_of_rows)
5×5 DataFrame
Row │ x1 x2 x3 x4 sum_of_rows
│ Float64 Float64 Float64 Float64 Float64
─────┼──────────────────────────────────────────────────────
1 │ 0.428323 0.757593 0.31246 0.640604 2.13898
2 │ 0.633038 0.169417 0.528082 0.669834 2.00037
3 │ 0.0376092 0.786284 0.399867 0.720137 1.9439
4 │ 0.673165 0.66684 0.437688 0.35086 2.12855
5 │ 0.614091 0.682412 0.511448 0.495272 2.30322
julia> transform(df, [:x1, :x2, :x3, :x4] => ((v...) -> sum(+(v...))) => :total_sum)
5×5 DataFrame
Row │ x1 x2 x3 x4 total_sum
│ Float64 Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────────
1 │ 0.428323 0.757593 0.31246 0.640604 10.515
2 │ 0.633038 0.169417 0.528082 0.669834 10.515
3 │ 0.0376092 0.786284 0.399867 0.720137 10.515
4 │ 0.673165 0.66684 0.437688 0.35086 10.515
5 │ 0.614091 0.682412 0.511448 0.495272 10.515
However, since you are struggling maybe using DataFramesMeta.jl will be easier for you:
julia> using DataFramesMeta
julia> @transform(df, :sum_of_rows = :x1 + :x2 + :x3 + :x4)
5×5 DataFrame
Row │ x1 x2 x3 x4 sum_of_rows
│ Float64 Float64 Float64 Float64 Float64
─────┼──────────────────────────────────────────────────────
1 │ 0.428323 0.757593 0.31246 0.640604 2.13898
2 │ 0.633038 0.169417 0.528082 0.669834 2.00037
3 │ 0.0376092 0.786284 0.399867 0.720137 1.9439
4 │ 0.673165 0.66684 0.437688 0.35086 2.12855
5 │ 0.614091 0.682412 0.511448 0.495272 2.30322
julia> @transform(df, :total_sum = sum(:x1 + :x2 + :x3 + :x4))
5×5 DataFrame
Row │ x1 x2 x3 x4 total_sum
│ Float64 Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────────
1 │ 0.428323 0.757593 0.31246 0.640604 10.515
2 │ 0.633038 0.169417 0.528082 0.669834 10.515
3 │ 0.0376092 0.786284 0.399867 0.720137 10.515
4 │ 0.673165 0.66684 0.437688 0.35086 10.515
5 │ 0.614091 0.682412 0.511448 0.495272 10.515
2 Likes
It seems to me that your problem is that you are broadcasting the first =>
(i.e., you put a dot before it). Try @bkmins ’s answers while taking care to not do it again.
1 Like
Nice!
Thx for the help and the DataFramesMeta.jl tip.
I have only one more question, what are these “v” and the “…” that you used here:
transform(df, [:x1, :x2, :x3, :x4] => ((v...) -> sum(+(v...))) => :total_sum)
For now, thx @Henrique_Becker and @bkamins
Copyed from an old answer of mine :
I think the manual sections you want are:
https://docs.julialang.org/en/v1/manual/faq/#The-two-uses-of-the-…-operator:-slurping-and-splatting – FAQ section that confirms many beginners find ...
confusing.
Keyword arguments – The manual section on keyword arguments.
1 Like
I tried without the broadcasting and the error changed to:
colunasSum = [:CancIncorreto,:DifColetado,:FaltaCancelado,:FaltaPersonalizado,:QuebraCaixa]
transform(df_pronto, colunasSum => sum => :Valor_desconto2)
MethodError: no method matching sum(::Vector{Float64}, ::Vector{Float64}, ::Vector{Float64}, ::Vector{Float64}, ::Vector{Float64})
Closest candidates are:
sum(::Any, ::AbstractArray; dims, kw...) at C:\Users\Pedro Henrique\AppData\Local\Programs\Julia-1.7.2\share\julia\base\reducedim.jl:890
sum(::Any, ::Any; kw...) at C:\Users\Pedro Henrique\AppData\Local\Programs\Julia-1.7.2\share\julia\base\reduce.jl:503
sum(::AbstractArray; dims, kw...) at C:\Users\Pedro Henrique\AppData\Local\Programs\Julia-1.7.2\share\julia\base\reducedim.jl:889
i’m still learning where and how to use the broadcasting.
Yes, the problem here is that you do not want sum
, you want +
.
What you are doing is equivalent to:
julia> sum([1, 2], [3, 4], [5, 6])
ERROR: MethodError: no method matching sum(::Array{Int64,1}, ::Array{Int64,1}, ::Array{Int64,1})
Closest candidates are:
sum(::Any, ::AbstractArray; dims) at reducedim.jl:723
sum(::Any, ::Any) at reduce.jl:494
sum(::AbstractArray; dims) at reducedim.jl:722
...
Stacktrace:
[1] top-level scope at REPL[1]:1
But what you want is:
julia> +([1, 2], [3, 4], [5, 6])
2-element Array{Int64,1}:
9
12
1 Like
Wow! Nice!
Now i understand the difference!
Thx for that explanation and the manuals, i will read them.
If you want you can use sum
but then you need to pass one argument to sum not multiple arguments, just like @Henrique_Becker explained, so the codes would be:
transform(df, AsTable([:x1, :x2, :x3, :x4]) => sum => :sum_of_rows)
and
transform(df, AsTable([:x1, :x2, :x3, :x4]) => sum∘sum => :total_sum)
respectively.
But initially I did not want to introduce another level of complexity to our discussion.
And now I see that I could avoid splatting and slurping earlier :
transform(df, [:x1, :x2, :x3, :x4] => sum∘(+) => :total_sum)
1 Like
lrnv
March 14, 2022, 6:58pm
11
bkamins:
And now I see that I could avoid splatting and slurping earlier :
transform(df, [:x1, :x2, :x3, :x4] => sum∘(+) => :total_sum)
I am not a user of dataframes, but reading you i was wandering about allocations. Does this allocate a lot or does the transform
function do some clever tricks ?
This allocates just as much as sum(x1 + x2 + x3 + x4)
would.
This cannot be further optimized because +
is not the same as sum
(which uses add_sum
).
However some transformations are optimized, especially for very wide tables. You can find their list here .
1 Like