It would be nice if the DataFrames pairs syntax could handle a renaming function in the third position, so that the names of the new columns can be determined programatically. Itβs currently possible to do this if you calculate the new names beforehand, like this:
cols = names(df, contains("temp"))
new_cols = cols .* "_celsius"
transform(df, cols => (t -> (t - 32) * 5/9) => new_cols)
However, if you have to calculate the source names and the destination names beforehand, it kind of defeats the purpose of the concise pairs syntax. Adding a renaming function is something that an Across
type could easily handle. For example:
julia> df = DataFrame(temp1 = 70:71, temp2 = 80:81)
2Γ2 DataFrame
Row β temp1 temp2
β Int64 Int64
ββββββΌββββββββββββββ
1 β 70 80
2 β 71 81
julia> transform(df,
Across(contains("temp");
apply = t -> (t - 32) * 5/9,
renamer = col -> col * "_celsius"
)
)
2Γ4 DataFrame
Row β temp1 temp2 temp1_celsius temp2_celsius
β Int64 Int64 Float64 Float64
ββββββΌββββββββββββββββββββββββββββββββββββββββββββ
1 β 70 80 21.1111 26.6667
2 β 71 81 21.6667 27.2222
See implementation details below. Note that Iβve made the applied function act by row, just to make things easier on myself.
An additional benefit of Across
here is that it can be saved as a reusable object, acr = Across(...)
, because it makes no reference to the specific column names in df
. Note that the cols => (t -> (t - 32) * 5/9) => new_cols
object from the first example is not reusable because it refers to specific columns in df
.
For fun, Iβve also implemented a preview
function that previews what Across
will do:
julia> preview(df,
Across(contains("temp");
apply = t -> (t - 32) * 5/9,
renamer = col -> col * "_celsius"
)
)
2Γ3 DataFrame
Row β source transformation destination
β String var"#14#16" String
ββββββΌβββββββββββββββββββββββββββββββββββββββ
1 β temp1 #14 temp1_celsius
2 β temp2 #14 temp2_celsius
Unfortunately anonymous functions donβt print very nicely, which is why we have #14
in the transformation
column. If you use a named function, it prints nicer:
julia> plus_one(x) = x + 1
plus_one (generic function with 1 method)
julia> preview(df, Across(contains("temp"); apply=plus_one))
2Γ3 DataFrame
Row β source transformation destination
β String #plus_oneβ¦ String
ββββββΌββββββββββββββββββββββββββββββββββββββββ
1 β temp1 plus_one temp1_plus_one
2 β temp2 plus_one temp2_plus_one
Minimal Implementation
using DataFrames
struct Across{S, F, R}
selector::S
f::F
renamer::R
end
function Across(selector; apply, renamer = col -> col * "_" * string(apply))
Across(selector, apply, renamer)
end
function DataFrames.transform(df::AbstractDataFrame, across::Across)
selector, f, renamer = across.selector, across.f, across.renamer
newdf = copy(df)
cols = names(newdf, selector)
for col in cols
newdf[:, renamer(col)] = f.(newdf[:, col])
end
newdf
end
function preview(df::AbstractDataFrame, a::Across)
cols = names(df, a.selector)
DataFrame(
source = cols,
transformation = a.f,
destination = a.renamer.(cols)
)
end