Hello,
I want to apply a function (my_func) to the DataFrameRow object, so I can access all the column-names from within the function, independently of the col_names declared in the transform.
I can’t find the way of accessing the Row object from within the transform function.
An example will clarify:
df = DataFrame(A=String[], B=String[])
push!(df, ["red","apple"])
push!(df, ["foo","bar"])
function unify_names(x::DataFrameRow)
@unpack A,B = x
return joinpath(A,B)
end
unify_names(df[1,:])
unify_names(df[2,:])
## I would like something like this...
transform(df, :All=> unify_names => :C)
##============================
## But I have to write this
function unify_names(A::String, B::String)
return joinpath(A,B)
end
transform(df, [:A,:B] => ByRow((x,y)->unify_names(x,y)) => :C)
With the second syntax, I have to define the function for each column that I want to add/remove.
Imagine I want to load the columns conditionally, as in:
function unify_names(x::DataFrameRow)
@unpack A = x
if A=="red"
return joinpath(A,"banana")
else
@unpack B = x
return joinpath(A,"banana")
end
end
The function defined on DataFrameRow is soooo much flexible! Is it possible to use it? How?
Thanks!