I’m a bit surprised by this, seeing the examples in this thread. Here’s a summary:
Example 1:
# R
clean_data <- function(df, var1, var2) {
df %>% mutate_at(vars(var1), function(x) x*100) %>%
mutate_at(vars(var2), function(x) x*200)
}
# Julia
function clean_data!(df, var1, var2)
@chain df begin
@rtransform! $var1 = $var1 * 100
@rtransform! $var2 = $var2 * 200
end
end
Example 2:
# R
clean_data <- function(df, var1, var2) {
df %>% mutate("{{var1}}_cleaned" := {{var1}} * 100,
"{{var2}}_cleaned" := {{var2}} * 100)
}
# Julia
function clean_data!(df, var1, var2)
@chain df begin
@rtransform! $(var1 * "_cleaned") = $var1 * 100
@rtransform! $(var2 * "_cleaned") = $var2 * 200
end
end
Example 3:
# R
clean_data <- function(df, var1, var2) {
df %>% mutate(across(all_of(var1), function(x) x*100, .names = "{.col}_cleaned")) %>%
mutate(across({{var2}}, function(x) x*100, .names = "{.col}_cleaned"))
}
clean_data(df, "x1", x2)
# Julia (edited, see comments below)
function clean_data!(df, var1, var2)
@chain df begin
@rtransform! $("$(var1)_cleaned") = $var1 * 100
@rtransform! $("$(var2)_cleaned") = $var2 * 200
end
end
clean_data!(df, "x1", :x2)
Example 4:
# R
df[[paste0(var1, "_cleaned")]] <- df[[var1]] * 100
# Julia
df[:, var1*"_cleaned"] = df[:, var1] * 100
Conclusion:
One thing that looks worse in Julia is the need for @chain df begin
.
Otherwise what strikes me is the amount of specialized functions and syntax that’s needed in R. And every time the problem is a bit different, the code gets replaced with a very different solution. Look at the new things we need to learn as we move from one example to another (I don’t count standard language syntax like R function(x)...
lambda in the first example and Julia indexing in the last example).
Example 1:
R: mutate_at
and vars
Julia: @rtransform!
, =
and escaping with $
.
Example 2:
R: "{...}"
, {...}
and :=
.
Julia: -
Example 3:
R: across
, all_of
, .names
and {.col}
.
Julia: :col
can be used instead of "col"
Example 4:
R: [[...]]
Julia: -
Of course there is a big thing to learn with DataFrames.jl which is not shown here: the =>
minilanguage. There’s a real learning curve but it often gives my favorite solution:
transform(df, var1 => (x->100x),
var2 => (x->200x), renamecols=false)
transform(df, var1 => (x->100x) => var1*"_cleaned",
var2 => (x->200x) => var2*"_cleaned)
So clear and consistent