# Apply a column of anonymous functions for each column in a column subset

Hi,

Suppose we have a DataFrame like this:

``````df = DataFrame(name='a':'c',x1=1:3,x2=[[1,2],[3,4],[5,6]],xfun=[x->x.-1,x->x.^2,x->x.^3])
``````

I want to transform column `x1` and `x2` in `df` by apply each fun in xfun for the corresponding row of `x1` and `x2`, I could use [:x1,:xfun]=>ByRow((x,f)->f(x))=>:x1, but what if there are 20 of these columns, is there other elegant way to achieve this?

The other way i can think of is to convert the columns to a Matrix, and broadcasting a vector of anonymous functions to the first dimention of the Matrix, but i donβt know if there is a generic `apply` function to broadcast?

Thanks,
Alex

``````julia> combine(df, vcat.(["x1", "x2"], "xfun") .=> ByRow((x,f) -> f(x)) => first)
3Γ2 DataFrame
Row β x1     x2
β Int64  Arrayβ¦
ββββββΌβββββββββββββββββββ
1 β     0  [0, 1]
2 β     4  [9, 16]
3 β    27  [125, 216]
``````

and instead of `["x1", "x2"]` provide an expression that generates the column names you want to include.

Thanks, itβs exactly what i want. The `transform` version works too, like this:

``````transform(df, vcat.(["x1", "x2"], "xfun") .=> ByRow((x,f) -> f(x)) => first)
``````

is there other different between these two?

The differences are:

• `transform` keeps all source columns always; `combine` only keeps columns specified in transformations;
• `transform` requires output to have as many rows as input; `combine` allows any number of rows in output.

Other than that these functions interpret transformation specifications in the same way (i.e. the same engine processes both requests, but different additional constraints are added)

2 Likes

just a slightly different way of combining things

``````cols=["x1", "x2"]
combine(df, ["xfun";cols]=>ByRow((f,x...)->f.(x))=>cols)
``````

but above all to ask for information on the use of the `first` function instead of a list of names / symbols of columns in output.

PS

I wonder if and when it will also be possible to write something like this

``````combine(df, [cols;"xfun"]=>ByRow((x...,f)->f.(x))=>cols)

# so for the given df is possible to save some typing :-)

combine(df, 2:4=>ByRow((x...,f)->f.(x))=>2:3)
``````
1 Like

Thanks for the clarification!

splitting to a vector of names is also quite concise.

Base Julia does not allow this and I do not think it will be allowed.

I take this opportunity to ask you a further question, this one more specific one relating to the mini language.
If I understand correctly, some input forms such as columns range are not allowed in output.
For example 2: 3 => fun => 2: 3, it doesnβt work.
If so, what is the reason for these restrictions?

This could work and would mean the following:

pass contents of columns 2 and 3 as positional arguments to function `fun` and expand the result returned by it into two columns whose names are taken as names of columns 2 and 3 from the source

The first question is if this is what you would expect. If this is what you would expect, at least for me this is a very specific case that is needed quite rarely and currently it can be expressed as `2:3 => fun => names(df, 2:3)` which is only a bit more verbose.

For single column transformations like `2 => fun => 2` in your proposed notation, which are more common, either pass `renamecols=false` as kwarg and write just `2 => fun` or write `2 => fun => identity` to retain source column name. This does not cover the case like `2 => fun => 3`, but again I think that it is quite rare.

What is your use case where you require this kind of transformations?

the simple one: the first.
Obviously when I did the test I mixed something else.
Thanks

For a reference here is an example where your original syntax could be useful:

``````julia> using DataFrames

julia> fun(x, y) = map((a, b) -> (a+b, a-b), x, y)
fun (generic function with 1 method)

julia> df = DataFrame(a=1:3, b=4:6)
3Γ2 DataFrame
Row β a      b
β Int64  Int64
ββββββΌββββββββββββββ
1 β     1      4
2 β     2      5
3 β     3      6

julia> combine(df, [:a, :b] => fun => [:a, :b])
3Γ2 DataFrame
Row β a      b
β Int64  Int64
ββββββΌββββββββββββββ
1 β     5     -3
2 β     7     -3
3 β     9     -3
``````