Transforming dataframe columns selected via regex while keeping the other columns

Let’s say I have df = DataFrame(ax = 1:3, bx = 4:6, cy = 7:9, dy = 10:12) and want to double the values of each column that has an “x” in its header, while leaving the columns that don’t have an “x” in their header unchanged. When I try to do that with

transform(df, r"x" => ByRow(x -> 2x); renamecols = false)

I get MethodError: no method matching (::Main.workspace4.var"#1#2")(::Int64, ::Int64). It looks like it’s trying to use them each as inputs to a single bivariate function, because the following code runs

transform(df, r"x" => ByRow(+))

I know I could select columns with regex and then apply the transformation to all of them, but that would leave the other columns out, which I don’t want.

you should know (which I don’t know) what the expression r “x” produces in this context (a tuple perhaps?) , in order to handle it properly with the function x-> 2x.
While waiting for an explanation of what is going owhen one use a regex expression to select columns, you can try using such a workaround …

transform(df, [x for x in names(df) if contains(x,"x")].=> ByRow(x -> 2x);renamecols=false)

pay attention to the “.” in “. =>”

anhoter way, using the named tuple’s properties

transform(df, AsTable(r"x")=> (nt->(;zip(keys(nt),2 .* values(nt))...))=>AsTable)

or using splatting operator

transform(df, r"x".=> ByRow((x...) -> 2 .*x)=>filter(x->contains(x,"x"),names(df)))

or this way

transform(df, r"x".=> ByRow((x...) -> 2 .*x)=>names(select(df,r"x")))

but unfortunately, the following doesn’t work

transform(df, r"x".=> ByRow((x...) -> 2 .*x)=>r"x")

MethodError: no method matching getindex(::DataFrames.Index, ::Pair{Regex, Pair{ByRow{var"#181#182"}, Regex}})

at least not in this naive form. Perhaps using the regex capabilities appropriately, the result can be achieved.

Or

transform(df, names(df, r"x") .=> (x -> 2x), renamecols=false)
2 Likes

Thanks for the solution. Is there a way to do it using ByRow , in case the applied function operates on each element of the column instead of the entire column?

ByRow is the other way to do the “broadcasting” for dataframes like you said. Take the addition for example. We will get the same result by either of the following

transform(df, names(df, r"x") .=> (x -> 100 .+ x), renamecols=false)
transform(df, names(df, r"x") .=> ByRow(x -> 100 + x), renamecols=false)
3 Likes