How to use the transform function on data frames

I apologize for a very low-level question but I’ve just started learning the language and getting stumped. I am trying to take a dataframe and create a new column that’s the average of two other columns. I understand that this is what the transform function is supposed to do, but I’m having a hard time understanding the syntax. I entered the following:

using DataFrames
df = DataFrame(A = [10, 20, 30, 40, 50], B = [5, 15, 25, 35, 45])
df = transform(df, :Average => ((df.A + df.B) / 2) => :Average)
println(df)

What I expected was for there for be a new column “Average” that has the average of columns A and B. But I got the error message: ArgumentError: invalid index: :Average => ([7.5, 17.5, 27.5, 37.5, 47.5] => :Average) of type Pair{Symbol, Pair{Vector{Float64}, Symbol}} I’m not able to make anything of that message.

Thanks for your patience and help!

Why not just
df.Average = (df.A + df.B) / 2
?

3 Likes

A way to use transform is the following:

transform(df, [:A, :B] => ((a, b) -> (a .+ b) ./ 2) => :Average)

i.e., you need to provide original_columns => some_function => new_column. Note that the function takes the whole column vectors as arguments and usually requires explicit broadcasting.

DataFramesMeta provides a somewhat nicer syntax, allowing to access old and new columns by name directly:

@transform df :Avg = (:A .+ :B) ./ 2

Note that this expand exactly to the above transform call as you may check via macroexpand.

2 Likes

In the source => fun => dest syntax you have

  1. The source: This is a column or a collection of columns. You got this wrong, since you want the input to be two columns, :A, and :B
  2. The fun: This is a function. Frequently, people put anonymous functions here, as in (a, b) -> a + b)
  3. The dest: This is the name of the new column being created. You got this right with :Average.

Solution 1:

# ByRow because this is row-wise
df = transform(df, [:A, :B] => ByRow((A, B) -> (A + B) / 2)) => :Average)

Solution 2: Use the package DataFramesMeta.jl, which wraps transform with a nicer syntax

df = @transform df :Average = (:A + :B) / 2
1 Like

Thanks to the first three comments here, all of these seem to work fine. I wasn’t aware of the first approach (df.Average = (df.A + df.B)/2). I’ll look into the DataFramesMeta package. These are all solutions but I’d like to keep the thread rolling so I can see different ways of doing this, thanks.

I’ve written a guide on how to use transform (and similar) oriented toward Julia beginners which should be included in the next release of the DataFrames.jl documentation. You can preview it here. Bogumil also has a tutorial written in notebooks here that you may find useful.

(There is currently a Documenter.jl error in my pull request preventing merge if anyone wants to help fix.)

2 Likes

Thanks! I’ll check these out.