Apply a column of anonymous functions for each column in a column subset

Hi,

Suppose we have a DataFrame like this:

df = DataFrame(name='a':'c',x1=1:3,x2=[[1,2],[3,4],[5,6]],xfun=[x->x.-1,x->x.^2,x->x.^3])

I want to transform column x1 and x2 in df by apply each fun in xfun for the corresponding row of x1 and x2, I could use [:x1,:xfun]=>ByRow((x,f)->f(x))=>:x1, but what if there are 20 of these columns, is there other elegant way to achieve this?

The other way i can think of is to convert the columns to a Matrix, and broadcasting a vector of anonymous functions to the first dimention of the Matrix, but i don’t know if there is a generic apply function to broadcast?

Thanks,
Alex

julia> combine(df, vcat.(["x1", "x2"], "xfun") .=> ByRow((x,f) -> f(x)) => first)
3Γ—2 DataFrame
 Row β”‚ x1     x2
     β”‚ Int64  Array…
─────┼───────────────────
   1 β”‚     0  [0, 1]
   2 β”‚     4  [9, 16]
   3 β”‚    27  [125, 216]

and instead of ["x1", "x2"] provide an expression that generates the column names you want to include.

Thanks, it’s exactly what i want. The transform version works too, like this:

transform(df, vcat.(["x1", "x2"], "xfun") .=> ByRow((x,f) -> f(x)) => first)

is there other different between these two?

The differences are:

  • transform keeps all source columns always; combine only keeps columns specified in transformations;
  • transform requires output to have as many rows as input; combine allows any number of rows in output.

Other than that these functions interpret transformation specifications in the same way (i.e. the same engine processes both requests, but different additional constraints are added)

2 Likes

just a slightly different way of combining :wink: things

cols=["x1", "x2"]
combine(df, ["xfun";cols]=>ByRow((f,x...)->f.(x))=>cols)

but above all to ask for information on the use of the first function instead of a list of names / symbols of columns in output.

PS

I wonder if and when it will also be possible to write something like this

combine(df, [cols;"xfun"]=>ByRow((x...,f)->f.(x))=>cols)

# so for the given df is possible to save some typing :-)

combine(df, 2:4=>ByRow((x...,f)->f.(x))=>2:3)
1 Like

Thanks for the clarification!

splitting to a vector of names is also quite concise.

Base Julia does not allow this and I do not think it will be allowed.

I take this opportunity to ask you a further question, this one more specific one relating to the mini language.
If I understand correctly, some input forms such as columns range are not allowed in output.
For example 2: 3 => fun => 2: 3, it doesn’t work.
If so, what is the reason for these restrictions?

This could work and would mean the following:

pass contents of columns 2 and 3 as positional arguments to function fun and expand the result returned by it into two columns whose names are taken as names of columns 2 and 3 from the source

The first question is if this is what you would expect. If this is what you would expect, at least for me this is a very specific case that is needed quite rarely and currently it can be expressed as 2:3 => fun => names(df, 2:3) which is only a bit more verbose.

For single column transformations like 2 => fun => 2 in your proposed notation, which are more common, either pass renamecols=false as kwarg and write just 2 => fun or write 2 => fun => identity to retain source column name. This does not cover the case like 2 => fun => 3, but again I think that it is quite rare.

What is your use case where you require this kind of transformations?

the simple one: the first.
Obviously when I did the test I mixed something else.
Thanks

For a reference here is an example where your original syntax could be useful:

julia> using DataFrames

julia> fun(x, y) = map((a, b) -> (a+b, a-b), x, y)
fun (generic function with 1 method)

julia> df = DataFrame(a=1:3, b=4:6)
3Γ—2 DataFrame
 Row β”‚ a      b
     β”‚ Int64  Int64
─────┼──────────────
   1 β”‚     1      4
   2 β”‚     2      5
   3 β”‚     3      6

julia> combine(df, [:a, :b] => fun => [:a, :b])
3Γ—2 DataFrame
 Row β”‚ a      b
     β”‚ Int64  Int64
─────┼──────────────
   1 β”‚     5     -3
   2 β”‚     7     -3
   3 β”‚     9     -3