I have a dataset with a bunch of columns with string data that have the format “(1.0, 2.0)”. All of these columns end with the postfix “CI”. I’d like to apply a function that splits the numbers into the low and high value and parses them as floats. Then I’d like to replace the original columns with $(original_name)_low and $(original_name)_low.
I know how I would do this with DataFrames.jl DSL, but I’d really like to achieve this with TiderData.jl because it makes my code more accessible for my R peers.
Any chance it’s possible without hard coding all the column names?
here is one possible solution where you turn the string into a tuple or an array and then use @unnest_wider. @kdpsingh may be able to offer others that are better too
Now that you have a solution with TidierData I hope it’s not annoying to post a solution with (disclaimer: my package) DataFrameMacros, it’s a lot of fun to try and solve these data wrangling “code golfing” problems and I was happy to find a one-liner for this:
The string.({}, ["_low", "_high"]) part expands to [["a_CI_low", "a_CI_high"], ["b_CI_low", "b_CI_high"]] so in DataFrames minilanguage it’s like having [:a_CI, :b_CI] .=> the_function .=> [["a_CI_low", "a_CI_high"], ["b_CI_low", "b_CI_high"]].
Thanks for the question and for all the different ways people have shared to do this.
Here’s one way to do this using TidierData without the need to hardcode column names.
using TidierData, TidierStrings
# define the DataFrame
df = DataFrame(
a_CI = ["(1.0, 2.0)", "(2.5, 3.5)", "(3.2, 4.1)"],
b_CI = ["(0.1, 1.1)", "(0.5, 1.5)", "(1.2, 2.1)"],
)
# fixes a bug that will be addressed in the next version of TidierData
push!(TidierData.not_escaped[], :Meta)
# the code
@chain df begin
@summarize(across(everything(), x -> eval.(Meta.parse.(x))))
@summarize(across(everything(), (function low(x) minimum.(x) end,
function high(x) maximum.(x) end)))
@rename_with(x -> str_remove_all(x, "_function"), everything())
end