Transforming dataframe columns selected via regex while keeping the other columns

SailChickenBright · May 24, 2021, 3:45am

Let’s say I have df = DataFrame(ax = 1:3, bx = 4:6, cy = 7:9, dy = 10:12) and want to double the values of each column that has an “x” in its header, while leaving the columns that don’t have an “x” in their header unchanged. When I try to do that with

transform(df, r"x" => ByRow(x -> 2x); renamecols = false)

I get MethodError: no method matching (::Main.workspace4.var"#1#2")(::Int64, ::Int64). It looks like it’s trying to use them each as inputs to a single bivariate function, because the following code runs

transform(df, r"x" => ByRow(+))

I know I could select columns with regex and then apply the transformation to all of them, but that would leave the other columns out, which I don’t want.

rocco_sprmnt21 · May 24, 2021, 7:37am

you should know (which I don’t know) what the expression r “x” produces in this context (a tuple perhaps?) , in order to handle it properly with the function x-> 2x.
While waiting for an explanation of what is going owhen one use a regex expression to select columns, you can try using such a workaround …

transform(df, [x for x in names(df) if contains(x,"x")].=> ByRow(x -> 2x);renamecols=false)

pay attention to the “.” in “. =>”

anhoter way, using the named tuple’s properties

transform(df, AsTable(r"x")=> (nt->(;zip(keys(nt),2 .* values(nt))...))=>AsTable)

or using splatting operator

transform(df, r"x".=> ByRow((x...) -> 2 .*x)=>filter(x->contains(x,"x"),names(df)))

or this way

transform(df, r"x".=> ByRow((x...) -> 2 .*x)=>names(select(df,r"x")))

but unfortunately, the following doesn’t work

transform(df, r"x".=> ByRow((x...) -> 2 .*x)=>r"x")

MethodError: no method matching getindex(::DataFrames.Index, ::Pair{Regex, Pair{ByRow{var"#181#182"}, Regex}})

at least not in this naive form. Perhaps using the regex capabilities appropriately, the result can be achieved.

qsong · May 24, 2021, 8:32am

Or

transform(df, names(df, r"x") .=> (x -> 2x), renamecols=false)

SailChickenBright · May 24, 2021, 9:52am

Thanks for the solution. Is there a way to do it using ByRow , in case the applied function operates on each element of the column instead of the entire column?

qsong · May 24, 2021, 10:03am

ByRow is the other way to do the “broadcasting” for dataframes like you said. Take the addition for example. We will get the same result by either of the following

transform(df, names(df, r"x") .=> (x -> 100 .+ x), renamecols=false)
transform(df, names(df, r"x") .=> ByRow(x -> 100 + x), renamecols=false)

Topic		Replies	Views
Transform multiple columns of a DataFrame using the same function Data dataframes	12	4208	January 23, 2023
Transforming string columns in DataFrame with (regex) match General Usage	3	549	October 26, 2021
Transform operation using two or more columns in a DataFrame Data dataframes	6	413	February 28, 2022
Dataframe transform operation on multiple columns General Usage dataframes	10	4215	August 8, 2020
Passing DataFrameRow to transform in split-apply-combine General Usage question , dataframes	4	306	May 1, 2023

Transforming dataframe columns selected via regex while keeping the other columns

Related topics