I can apply a funciton to each column by broadcasting it. BUt I find the syntax of

```
@chain df begin
transform([:a, :b, :c] .=> (x->fn.(x)) .=> [:a, :b, :c])
end
```

clunky. I dont’ like `(x->fn.(x))`

in particular as I feel it’s somewhat inelegant. Just looking to see if there are better options.

Full MWE:

```
using Chain, DataFrames
df = DataFrame(a = 1:3, b=1:3, c=1:3, d = ["a", "b", "c"]
fn(x) = 2x
@chain df begin
transform([:a, :b, :c] .=> (x->fn.(x)) .=> [:a, :b, :c])
end
```

This is a point of introducing `ByRow`

:

```
julia> transform(df, [:a, :b, :c] .=> ByRow(fn) .=> [:a, :b, :c])
3×4 DataFrame
Row │ a b c d
│ Int64 Int64 Int64 String
─────┼─────────────────────────────
1 │ 2 2 2 a
2 │ 4 4 4 b
3 │ 6 6 6 c
julia> transform(df, [:a, :b, :c] .=> ByRow(fn), renamecols=false)
3×4 DataFrame
Row │ a b c d
│ Int64 Int64 Int64 String
─────┼─────────────────────────────
1 │ 2 2 2 a
2 │ 4 4 4 b
3 │ 6 6 6 c
```

(as a side benefit having `ByRow`

reduces compilation latency)

5 Likes

I was a little confused by `ByRow`

. Does it have inefficiencies or should I use it? Was confused by it for a while what is meant by “ByRow”

sudete
May 22, 2021, 12:12pm
#4
To do it for all columns you can simply use `mapcols`

.

```
julia> df = DataFrame(a = 1:3, b=1:3, c=1:3);
julia> fn(x) = 2x;
julia> mapcols(fn, df)
3×3 DataFrame
Row │ a b c
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 2 2 2
2 │ 4 4 4
3 │ 6 6 6
```

(Here I had to remove the `d`

column since `fn`

cannot be applied to it. I guess you’re aware of `mapcols`

and added the `d`

column on purpose, but I thought it’d be nice to mention `mapcols`

for others reading this thread).

sudete:

for all columns

I just wanted to do to some cols

This is not the same as what @xiaodai wanted as in this case `fn`

is not broadcasted.

1 Like

Simple answer: `ByRow(f)(x...)`

is the same as `f.(x...)`

.

Complex answer:

it is the same in most of the cases provided you pass vectors as arguments;
however, internally we do not use broadcasting, because broadcasting is expensive to compile;
additionally if `x`

is a `NamedTuple`

that is a Tables.jl table we use a bit different rule (preserving column names for use inside the function)
The exact rules are super simple:

```
(f::ByRow)(cols::AbstractVector...) = map(f.fun, cols...)
(f::ByRow)(table::NamedTuple) = [f.fun(nt) for nt in Tables.namedtupleiterator(table)]
```

5 Likes

sudete
May 22, 2021, 2:52pm
#8
Ah good point, it only works in this example because `fn(x) = 2x`

and scalar multiplication on columns is equivalent to multiplying row by row.

1 Like

Yes, however, still I think `ByRow`

is a little easier to read in:

```
julia> using DataFrames
julia> df = DataFrame(rand(3,4), :auto)
3×4 DataFrame
Row │ x1 x2 x3 x4
│ Float64 Float64 Float64 Float64
─────┼──────────────────────────────────────────
1 │ 0.798251 0.788097 0.12371 0.447479
2 │ 0.320884 0.561217 0.736315 0.113512
3 │ 0.691074 0.807812 0.31742 0.00395885
julia> mapcols(ByRow(sin), df)
3×4 DataFrame
Row │ x1 x2 x3 x4
│ Float64 Float64 Float64 Float64
─────┼──────────────────────────────────────────
1 │ 0.716137 0.709013 0.123395 0.432694
2 │ 0.315405 0.532217 0.671562 0.113268
3 │ 0.637365 0.722777 0.312116 0.00395884
julia> mapcols(x -> sin.(x), df)
3×4 DataFrame
Row │ x1 x2 x3 x4
│ Float64 Float64 Float64 Float64
─────┼──────────────────────────────────────────
1 │ 0.716137 0.709013 0.123395 0.432694
2 │ 0.315405 0.532217 0.671562 0.113268
3 │ 0.637365 0.722777 0.312116 0.00395884
```

per your proposal

1 Like

It will be fast, as fast as `map`

. The only exception is with `AsTable`

in which case because it acts on a `NamedTupleIterator`

there might be some overhead.

1 Like