Passing DataFrameRow to transform in split-apply-combine


I want to apply a function (my_func) to the DataFrameRow object, so I can access all the column-names from within the function, independently of the col_names declared in the transform.
I can’t find the way of accessing the Row object from within the transform function.

An example will clarify:

df = DataFrame(A=String[], B=String[])
push!(df, ["red","apple"])
push!(df, ["foo","bar"])

function unify_names(x::DataFrameRow)
       @unpack A,B = x
       return joinpath(A,B)


## I would like something like this...
transform(df, :All=> unify_names => :C)

## But I have to write this
function unify_names(A::String, B::String)
       return joinpath(A,B)
transform(df, [:A,:B] => ByRow((x,y)->unify_names(x,y)) => :C)

With the second syntax, I have to define the function for each column that I want to add/remove.
Imagine I want to load the columns conditionally, as in:

function unify_names(x::DataFrameRow)
       @unpack A = x
       if A=="red"
              return joinpath(A,"banana")
              @unpack B = x
              return joinpath(A,"banana")


The function defined on DataFrameRow is soooo much flexible! Is it possible to use it? How?


You want AsTable

transform(df, AsTable([:A, :B])  => ByRow(unify_names) => :C)

With DataFramesMeta.jl it’s

@rtransform :C = unify_names(AsTable([:A, :B]))

Also, do you really want joinpath? Why not just do string(a, b)?

1 Like


Yes, AsTable(propertynames(df)) was exactly what I was looking for!

alternative ways

transform(df, [:A,:B]=>ByRow((x,y)->join(x,y)))

transform(df, names(df)=>(x...)->string.(x...))

transform(df, Cols(:)=>(x...)->string.(x...))

transform(df, Cols(:)=>(x...)->.*(x...))

df.C= (*).(eachcol(df)...)

Query.jl way looks like this:

new_df = df |> @mutate(C=joinpath(_.A, _.B)) |> DataFrame