It would be nice if the DataFrames pairs syntax could handle a renaming function in the third position, so that the names of the new columns can be determined programatically. It’s currently possible to do this if you calculate the new names beforehand, like this:
cols = names(df, contains("temp"))
new_cols = cols .* "_celsius"
transform(df, cols => (t -> (t - 32) * 5/9) => new_cols)
However, if you have to calculate the source names and the destination names beforehand, it kind of defeats the purpose of the concise pairs syntax. Adding a renaming function is something that an Across type could easily handle. For example:
julia> df = DataFrame(temp1 = 70:71, temp2 = 80:81)
2×2 DataFrame
Row │ temp1 temp2
│ Int64 Int64
─────┼──────────────
1 │ 70 80
2 │ 71 81
julia> transform(df,
Across(contains("temp");
apply = t -> (t - 32) * 5/9,
renamer = col -> col * "_celsius"
)
)
2×4 DataFrame
Row │ temp1 temp2 temp1_celsius temp2_celsius
│ Int64 Int64 Float64 Float64
─────┼────────────────────────────────────────────
1 │ 70 80 21.1111 26.6667
2 │ 71 81 21.6667 27.2222
See implementation details below. Note that I’ve made the applied function act by row, just to make things easier on myself.
An additional benefit of Across here is that it can be saved as a reusable object, acr = Across(...), because it makes no reference to the specific column names in df. Note that the cols => (t -> (t - 32) * 5/9) => new_cols object from the first example is not reusable because it refers to specific columns in df.
For fun, I’ve also implemented a preview function that previews what Across will do:
julia> preview(df,
Across(contains("temp");
apply = t -> (t - 32) * 5/9,
renamer = col -> col * "_celsius"
)
)
2×3 DataFrame
Row │ source transformation destination
│ String var"#14#16" String
─────┼───────────────────────────────────────
1 │ temp1 #14 temp1_celsius
2 │ temp2 #14 temp2_celsius
Unfortunately anonymous functions don’t print very nicely, which is why we have #14 in the transformation column. If you use a named function, it prints nicer:
julia> plus_one(x) = x + 1
plus_one (generic function with 1 method)
julia> preview(df, Across(contains("temp"); apply=plus_one))
2×3 DataFrame
Row │ source transformation destination
│ String #plus_one… String
─────┼────────────────────────────────────────
1 │ temp1 plus_one temp1_plus_one
2 │ temp2 plus_one temp2_plus_one
Minimal Implementation
using DataFrames
struct Across{S, F, R}
selector::S
f::F
renamer::R
end
function Across(selector; apply, renamer = col -> col * "_" * string(apply))
Across(selector, apply, renamer)
end
function DataFrames.transform(df::AbstractDataFrame, across::Across)
selector, f, renamer = across.selector, across.f, across.renamer
newdf = copy(df)
cols = names(newdf, selector)
for col in cols
newdf[:, renamer(col)] = f.(newdf[:, col])
end
newdf
end
function preview(df::AbstractDataFrame, a::Across)
cols = names(df, a.selector)
DataFrame(
source = cols,
transformation = a.f,
destination = a.renamer.(cols)
)
end