I apologize for a very low-level question but I’ve just started learning the language and getting stumped. I am trying to take a dataframe and create a new column that’s the average of two other columns. I understand that this is what the `transform`

function is supposed to do, but I’m having a hard time understanding the syntax. I entered the following:

```
using DataFrames
df = DataFrame(A = [10, 20, 30, 40, 50], B = [5, 15, 25, 35, 45])
df = transform(df, :Average => ((df.A + df.B) / 2) => :Average)
println(df)
```

What I expected was for there for be a new column “Average” that has the average of columns A and B. But I got the error message: **ArgumentError: invalid index: :Average => ([7.5, 17.5, 27.5, 37.5, 47.5] => :Average) of type Pair{Symbol, Pair{Vector{Float64}, Symbol}}** I’m not able to make anything of that message.

Thanks for your patience and help!

Why not just

`df.Average = (df.A + df.B) / 2`

?

3 Likes

A way to use `transform`

is the following:

```
transform(df, [:A, :B] => ((a, b) -> (a .+ b) ./ 2) => :Average)
```

i.e., you need to provide `original_columns => some_function => new_column`

. Note that the function takes the whole column vectors as arguments and usually requires explicit broadcasting.

`DataFramesMeta`

provides a somewhat nicer syntax, allowing to access old and new columns by name directly:

```
@transform df :Avg = (:A .+ :B) ./ 2
```

Note that this expand exactly to the above `transform`

call as you may check via `macroexpand`

.

2 Likes

In the `source => fun => dest`

syntax you have

- The
`source`

: This is a column or a collection of columns. You got this wrong, since you want the input to be two columns, `:A`

, and `:B`

- The
`fun`

: This is a *function*. Frequently, people put anonymous functions here, as in `(a, b) -> a + b)`

- The
`dest`

: This is the name of the new column being created. You got this right with `:Average`

.

Solution 1:

```
# ByRow because this is row-wise
df = transform(df, [:A, :B] => ByRow((A, B) -> (A + B) / 2)) => :Average)
```

Solution 2: Use the package DataFramesMeta.jl, which wraps `transform`

with a nicer syntax

```
df = @transform df :Average = (:A + :B) / 2
```

1 Like

Thanks to the first three comments here, all of these seem to work fine. I wasn’t aware of the first approach (`df.Average = (df.A + df.B)/2`

). I’ll look into the DataFramesMeta package. These are all solutions but I’d like to keep the thread rolling so I can see different ways of doing this, thanks.

I’ve written a guide on how to use `transform`

(and similar) oriented toward Julia beginners which should be included in the next release of the DataFrames.jl documentation. You can preview it here. Bogumil also has a tutorial written in notebooks here that you may find useful.

(There is currently a Documenter.jl error in my pull request preventing merge if anyone wants to help fix.)

2 Likes

Thanks! I’ll check these out.