Transform multiple columns of a DataFrame using the same function

I’d like to transform several columns “in-place” using the same function. For example, if I have

using DataFrames
df = DataFrame(a = 1:3, b = 4:6, c = 7:9)

and want to double columns a and c and replace those columns with the doubled ones, I’d like to be able to do something like

transform!(df, ["a", "c"] => ByRow(x -> 2x) => ["a", "c"])

so that I end up with

DataFrame(a = [2, 4, 6], b = 4:6, c = [14, 16, 18])

Just broadcast the pair operator:

julia> transform!(df, [:a, :c] .=> (x -> 2x) .=> [:a, :c])
3×3 DataFrame
 Row │ a      b      c     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     2      4     14
   2 │     4      5     16
   3 │     6      6     18

(no need for ByRow as you can just double the whole vector).

7 Likes

Also if the list of columns is long you can skip specifying the output names and use renamecols=false kwarg.

3 Likes

If you really want to use ByRow, ,try one of these (I got there by trial and error.
I obviously haven’t read all the documentation on the transform function yet) ) :

transform(df, AsTable([:a, :c]) => ByRow(x ->(2x.a,2x.c)) =>[:a,:c])

transform(df, AsTable([:a, :c]) => ByRow(x ->2 .* values(x)) =>[:a,:c])

anhoter way

 transform!(df, [:a, :c] => ByRow((x...) ->  2 .* x)=>[:a,:c])

This is not a the best solution: these examples pass all the columns to a single function which returns multiple columns. This is more complex than it needs to be and it will trigger more compilation than needed. The best way is to use .=>, and it works fine in combination with ByRow.

2 Likes

Thank you for the advice but I wasn’t able to get renamecols to work. Maybe my syntax is wrong:

df = DataFrame(a = 1:3, b = 4:6, c = 7:9)
transform!(df, [:a, :c] .=> (x -> 2x); renamecols=false)

MethodError: no method matching transform!(::DataFrames.DataFrame, ::Vector{Pair{String, var"#25#26"{typeof(*)}}}; renamecols=false)

Closest candidates are:

transform!(::DataFrames.DataFrame, ::Any...) at /Users/x/.julia/packages/DataFrames/GtZ1l/src/abstractdataframe/selection.jl:371 got unsupported keyword argument "renamecols"

transform!(!Matched::DataFrames.GroupedDataFrame{DataFrames.DataFrame}, ::Any...; ungroup) at /Users/x/.julia/packages/DataFrames/GtZ1l/src/groupeddataframe/splitapplycombine.jl:1719 got unsupported keyword argument "renamecols"

I think ByRow can still work with @nilshg’s reply. This code runs:

df = DataFrame(a = 1:3, b = 4:6, c = 7:9)
transform!(df, [:a, :c] .=> ByRow(x -> 2x) .=> [:a, :c])

Yes, I saw. Thank you.
I am trying to understand the various ways of acting on the elements of the dataframe.
For example, I saw here that it is possible to use a vector of triples cols => function => target_cols

colsname=names(df)
f=[x->2x,x->4x]
oddcol=colsname[1:2:3]
transform(df,[oddcol[i]=>ByRow(f[i])=>oddcol[i] for i in 1:length(oddcol)])

This works for me on Julia 1.6 with DataFrames 1.1.1.

1 Like

Thank you, I hadn’t updated DataFrames.jl. It works for me now.

I made several tests to achieve the required transformation, including the 3 reported here.
I would have expected (due to a sort of symmetry between input and output) the third to give the same result, but it doesn’t work.

transform(df, r"x"=> ((x,y)->(;ax=2x,bx=2y))=>AsTable )
transform(df, r"x".=> ByRow((x...) -> 2 .*x)=>[:ax,:bx])
transform(df, r"x".=> ByRow((x...) -> 2 .*x)=>r"x")

while an expression like this works:

transform(df, r"x"=> ByRow((x...) -> 2 .*x)=>names(df,r"x"))

I mean, src => fun => r"x" doesn’t have an intuitive meaning. r"x" is a selector that only makes sense in the context of a given data frame, which is why we need names(df, r"x").

I think the asymmetry is logical, as dest is only about the output of fun rather than the data frame as a whole. Think of the first example, with AsTable. The AsTable knows nothing about the input data frame, only the output of fun.

You can also add the keyword argument renamecols = false to get the same names as the input.

For use with regex, the following form is probably the most readable

df = DataFrame(aa = 1:3, b = 4:6, ac = 7:9)
transform(df, names(df, r"a") .=> x -> 2 .* x ; renamecols=false)
1 Like