From a DataFrame, how do I create multiple new columns that each take multiple columns as input in a set name pattern?

There’s a similar question about R on here.

However, I can’t find the solution in Julia.

Imagine I have a DataFrame

using DataFrames

df = DataFrame(
  group = rand(1:3, 100), 
  a1 = 1:100, a2=1:100, a3=1:100, 
  b1 = 1:100, b2=1:100, b3=1:100)

And I want to create new columns ci for i in 1 to 3 and they are based on the ai and bi for each i respectively.

using TidierData
using Chain: @chain
df = @chain df begin
 @group_by group
 @summarize(c1 = fn(a1, b1), c2 = fn(a2, b2), c3 = fn(a3, b3))
end

It’s ok to write it out for just 3 such cs but what if there are hundreds with more complex rules for finding the right variables?

I was hoping to be able to do something like this

using TidierData
using Chain: @chain
df = @chain df begin
 @group_by group
 @summarize
    assign("c" .* String.(1:3), fn(@starts_with("a") |> sort,  @starts_with("b") |> sort))
  )
end

What’s the most elegant way to do this in Julia? The only way I can think of is via a macro expand but was hoping for one where I don’t have to roll my own macro.

edit made it into a group by example instead of just mutate

Your first example can just be a loop?

for i in 1:100
    df[!, "c$i"] = fn(df[!, "a$i"], df[!, "b$i"])
end
2 Likes

Bad example by me, I actually need it inside a group-by. Let me change the example

inside combine you can write

combine(gdf, [["a$i", "b$i"] => fn => "c$i" for i in 1:100])

7 Likes