How to use the transform function on data frames

I apologize for a very low-level question but I’ve just started learning the language and getting stumped. I am trying to take a dataframe and create a new column that’s the average of two other columns. I understand that this is what the `transform` function is supposed to do, but I’m having a hard time understanding the syntax. I entered the following:

``````using DataFrames
df = DataFrame(A = [10, 20, 30, 40, 50], B = [5, 15, 25, 35, 45])
df = transform(df, :Average => ((df.A + df.B) / 2) => :Average)
println(df)
``````

What I expected was for there for be a new column “Average” that has the average of columns A and B. But I got the error message: ArgumentError: invalid index: :Average => ([7.5, 17.5, 27.5, 37.5, 47.5] => :Average) of type Pair{Symbol, Pair{Vector{Float64}, Symbol}} I’m not able to make anything of that message.

Thanks for your patience and help!

Why not just
`df.Average = (df.A + df.B) / 2`
?

3 Likes

A way to use `transform` is the following:

``````transform(df, [:A, :B] => ((a, b) -> (a .+ b) ./ 2) => :Average)
``````

i.e., you need to provide `original_columns => some_function => new_column`. Note that the function takes the whole column vectors as arguments and usually requires explicit broadcasting.

`DataFramesMeta` provides a somewhat nicer syntax, allowing to access old and new columns by name directly:

``````@transform df :Avg = (:A .+ :B) ./ 2
``````

Note that this expand exactly to the above `transform` call as you may check via `macroexpand`.

2 Likes

In the `source => fun => dest` syntax you have

1. The `source`: This is a column or a collection of columns. You got this wrong, since you want the input to be two columns, `:A`, and `:B`
2. The `fun`: This is a function. Frequently, people put anonymous functions here, as in `(a, b) -> a + b)`
3. The `dest`: This is the name of the new column being created. You got this right with `:Average`.

Solution 1:

``````# ByRow because this is row-wise
df = transform(df, [:A, :B] => ByRow((A, B) -> (A + B) / 2)) => :Average)
``````

Solution 2: Use the package DataFramesMeta.jl, which wraps `transform` with a nicer syntax

``````df = @transform df :Average = (:A + :B) / 2
``````
1 Like

Thanks to the first three comments here, all of these seem to work fine. I wasn’t aware of the first approach (`df.Average = (df.A + df.B)/2`). I’ll look into the DataFramesMeta package. These are all solutions but I’d like to keep the thread rolling so I can see different ways of doing this, thanks.

I’ve written a guide on how to use `transform` (and similar) oriented toward Julia beginners which should be included in the next release of the DataFrames.jl documentation. You can preview it here. Bogumil also has a tutorial written in notebooks here that you may find useful.

(There is currently a Documenter.jl error in my pull request preventing merge if anyone wants to help fix.)

2 Likes

Thanks! I’ll check these out.