I apologize for a very low-level question but I’ve just started learning the language and getting stumped. I am trying to take a dataframe and create a new column that’s the average of two other columns. I understand that this is what the transform
function is supposed to do, but I’m having a hard time understanding the syntax. I entered the following:
using DataFrames
df = DataFrame(A = [10, 20, 30, 40, 50], B = [5, 15, 25, 35, 45])
df = transform(df, :Average => ((df.A + df.B) / 2) => :Average)
println(df)
What I expected was for there for be a new column “Average” that has the average of columns A and B. But I got the error message: ArgumentError: invalid index: :Average => ([7.5, 17.5, 27.5, 37.5, 47.5] => :Average) of type Pair{Symbol, Pair{Vector{Float64}, Symbol}} I’m not able to make anything of that message.
Thanks for your patience and help!
Why not just
df.Average = (df.A + df.B) / 2
?
3 Likes
A way to use transform
is the following:
transform(df, [:A, :B] => ((a, b) -> (a .+ b) ./ 2) => :Average)
i.e., you need to provide original_columns => some_function => new_column
. Note that the function takes the whole column vectors as arguments and usually requires explicit broadcasting.
DataFramesMeta
provides a somewhat nicer syntax, allowing to access old and new columns by name directly:
@transform df :Avg = (:A .+ :B) ./ 2
Note that this expand exactly to the above transform
call as you may check via macroexpand
.
2 Likes
In the source => fun => dest
syntax you have
- The
source
: This is a column or a collection of columns. You got this wrong, since you want the input to be two columns, :A
, and :B
- The
fun
: This is a function. Frequently, people put anonymous functions here, as in (a, b) -> a + b)
- The
dest
: This is the name of the new column being created. You got this right with :Average
.
Solution 1:
# ByRow because this is row-wise
df = transform(df, [:A, :B] => ByRow((A, B) -> (A + B) / 2)) => :Average)
Solution 2: Use the package DataFramesMeta.jl, which wraps transform
with a nicer syntax
df = @transform df :Average = (:A + :B) / 2
1 Like
Thanks to the first three comments here, all of these seem to work fine. I wasn’t aware of the first approach (df.Average = (df.A + df.B)/2
). I’ll look into the DataFramesMeta package. These are all solutions but I’d like to keep the thread rolling so I can see different ways of doing this, thanks.
I’ve written a guide on how to use transform
(and similar) oriented toward Julia beginners which should be included in the next release of the DataFrames.jl documentation. You can preview it here. Bogumil also has a tutorial written in notebooks here that you may find useful.
(There is currently a Documenter.jl error in my pull request preventing merge if anyone wants to help fix.)
2 Likes
Thanks! I’ll check these out.