I’m a bit rusty on data table programming and I am getting stuck using the DataFrames.jl package when trying to applying a complex function to every row of a DataFrame. e.g., I would like to take the entirety of a `row`

, and use a bunch of different columns (imagine, all of them) in a complex calculation to calculate a new column.

I have tried:

`transform!(groupby(data, :uniqueID), complex_calc)`

and i think it does what i want, adding a new column called `x1`

but if i try to pass a name using:

`transform!(groupby(data, :uniqueID), complex_calc => "newname")`

it gives the error:

```
ArgumentError: invalid index: var"#complex_calc#254"{String}("wrapped_function_arg") => "newname" of type Pair{var"#complex_calc#254"{String}, String}
```

I have also tried:

`transform(data, :, :, complex_calc)`

which gives incorrect results

Probably you want:

```
transform(groupby(data, :uniqueID), AsTable(All()) => complex_calc => "newname")
```

if you want to work with all the columns.

However, note that in this case you can also just do:

```
[complex_calc(x) fo x in groupby(data, :uniqueID)]
```

(in which case the result will be a vector not a data frame, but sometimes you might prefer that.

3 Likes

There are a number of ways to do this depending on how you have constructed `complex_calc`

.

Some comments:

- I think the grouping isn’t doing anything for you if each group is a single row.
- The first element in the Pair should be the columns to pass to
`complex_calc`

. Then the second element is `complex_calc`

, and the third is the new column name.
- Remember that columns are passed as vectors to
`complex_calc`

. If you instead want the elements of the row passed as scalar arguments, then you need to wrap `complex_calc`

in `ByRow`

inside the transform.
`AsTable`

can be used to pass an entire row as one argument, but the columns of the table are still vectors unless you use `ByRow`

too.

```
using DataFrames
complex_calc1(x, y, z) = x * y + z
complex_calc2(row) = row.x * row.y + row.z
df = DataFrame(id = "Row " .* string.(1:3), x = 1:3, y = 4:6, z = 7:9)
transform!(df, Not(:id) => ByRow(complex_calc1) => "Positional Argument Method")
transform!(df, AsTable(:) => ByRow(complex_calc2) => "Row Method")
df."Direct Argument Method" = complex_calc1.(df.x, df.y, df.z)
df."Direct Row Method" = complex_calc2.(eachrow(df))
```

1 Like

A side-note is that the first three are going to be very fast. The last one will be slow.

1 Like

I got “unknown algorithm” errors when trying to use the 2nd and 4th methods. Any idea why?

Did you get that error running my Minimum Working Example (MWE) above?

I’m not familiar with that error. I imagine it is something inside your `complex_calc`

function which is not defined to take a `NamedTuple`

(Method 2) or `DataFrameRow`

(Method 4). Note that I defined two different `complex_calc`

functions and used a different transformation syntax depending on how I defined `complex_calc`

. If your existing `complex_calc`

works with Methods 1 and 3, then just used those. If you are set on using Method 2 or 4, you could add a new method `complex_calc(row) = complex_calc(row.x, row.y, row.z)`

to teach `complex_calc`

how to handle a single row argument.