Help generating an Expression

I have some wide-formatted data that I need to group and summarize. A MWE is below:

using DataFramesMeta

df = DataFrame(
    id = repeat(1:5, inner=3),
    a_1 = rand(15),
    a_2 = rand(15),
    a_3 = rand(15),
    b_1 = rand(15),
    b_2 = rand(15),
    b_3 = rand(15)
)

summary = @linq df |>
    by(
        :id,
        a_1 = sum(:a_1),
        a_2 = sum(:a_2),
        a_3 = sum(:a_3),
        b_1 = sum(:b_1),
        b_2 = sum(:b_2),
        b_3 = sum(:b_3),
    )

julia> summary
5Γ—7 DataFrame
β”‚ Row β”‚ id    β”‚ a_1     β”‚ a_2      β”‚ a_3      β”‚ b_1      β”‚ b_2     β”‚ b_3     β”‚
β”‚     β”‚ Int64 β”‚ Float64 β”‚ Float64  β”‚ Float64  β”‚ Float64  β”‚ Float64 β”‚ Float64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1     β”‚ 2.34505 β”‚ 0.881786 β”‚ 0.969993 β”‚ 0.635778 β”‚ 1.69327 β”‚ 1.22705 β”‚
β”‚ 2   β”‚ 2     β”‚ 1.24684 β”‚ 0.488665 β”‚ 1.65159  β”‚ 1.85681  β”‚ 1.63358 β”‚ 1.05604 β”‚
β”‚ 3   β”‚ 3     β”‚ 2.05449 β”‚ 0.928822 β”‚ 1.53021  β”‚ 1.66948  β”‚ 1.8532  β”‚ 1.69901 β”‚
β”‚ 4   β”‚ 4     β”‚ 1.77823 β”‚ 1.19143  β”‚ 1.36264  β”‚ 1.13891  β”‚ 1.90184 β”‚ 1.46904 β”‚
β”‚ 5   β”‚ 5     β”‚ 1.00121 β”‚ 1.91585  β”‚ 1.11492  β”‚ 1.69419  β”‚ 1.58653 β”‚ 1.98337 β”‚

This is fine but in my real example I have hundreds of columns so I don’t want to write out

    by(
        :id,
        a_1 = sum(:a_1),
        a_2 = sum(:a_2),
        a_3 = sum(:a_3),
        b_1 = sum(:b_1),
        b_2 = sum(:b_2),
        b_3 = sum(:b_3),
    )

this portion of the code. The column names follow a pattern so I was hoping to be able to generate an Expr that I could eval inside of by but I cannot figure out how to do this.

Something like:

julia> for l in (:a,:b), i in 1:3
           println("$(l)_$i = sum(:$(l)_$i),")
       end
a_1 = sum(:a_1),
a_2 = sum(:a_2),
a_3 = sum(:a_3),
b_1 = sum(:b_1),
b_2 = sum(:b_2),
b_3 = sum(:b_3),

but creating an Expr that I can inject into the by function above and eval.

Is such a thing possible or is there a better way to do this? Or am I stuck writing it all out?

Don’t use dataframesmeta for this, just use dataframes!

combine(groupby(df, :id), names(df, r"^a_") .=> sum, names(df, r"^b_") .=> sum)
4 Likes

I’ve been using DataFramesMeta for so long now that I forget how good regular ol’ DataFrames is :wink:

Well, this is (relatively) new functionality.

DataFramesMeta master right now uses transform, combine etc. as the back-end for @transform, @by etc.

So it should be prototype something in DataFramesMeta and then translate the expression to something without any literals and write it in DataFrames.

2 Likes