DataFrames.jl - Vectorized row-wise function application

dataframes

#1

What are efficient ways to row-wise apply a function f over a DataFrame? Looking through the DataFrames.jl documentation, I have found plenty of examples of column-wise aggregation. However, say that I want to take one or more columns, apply a function to each row, and output an array-like structure containing the results for each row, similar to Julia’s native vectorized dot notation for arrays:

f.(df[:A]) # something like this

Thus far, I have been doing as follows:

f.(vec(convert(Array, df[:A])))

Is this a reasonable way to go about it?


#2

I think what you are doing is right or check out this functionality in DataFramesMeta.jl

using DataFrames
df = DataFrame(a=1:3,b=4:6)
f(a,b) = a+b
f.(df[:a],df[:b])

# or
using DataFramesMeta
@with(df, f.(:a,:b))

#3

Thanks xiaodai. I was making a mistake with the dot notation which you helped me identify.

DataFramesMeta looks quite useful!


#4

Query.jl also lets you do this. For example to run a function f over the columns a and b:

df |> @map(f(_.a, _.b)) |> collect

Will return a vector of the results.