I have some wide-formatted data that I need to group and summarize. A MWE is below:
using DataFramesMeta
df = DataFrame(
id = repeat(1:5, inner=3),
a_1 = rand(15),
a_2 = rand(15),
a_3 = rand(15),
b_1 = rand(15),
b_2 = rand(15),
b_3 = rand(15)
)
summary = @linq df |>
by(
:id,
a_1 = sum(:a_1),
a_2 = sum(:a_2),
a_3 = sum(:a_3),
b_1 = sum(:b_1),
b_2 = sum(:b_2),
b_3 = sum(:b_3),
)
julia> summary
5Γ7 DataFrame
β Row β id β a_1 β a_2 β a_3 β b_1 β b_2 β b_3 β
β β Int64 β Float64 β Float64 β Float64 β Float64 β Float64 β Float64 β
βββββββΌββββββββΌββββββββββΌβββββββββββΌβββββββββββΌβββββββββββΌββββββββββΌββββββββββ€
β 1 β 1 β 2.34505 β 0.881786 β 0.969993 β 0.635778 β 1.69327 β 1.22705 β
β 2 β 2 β 1.24684 β 0.488665 β 1.65159 β 1.85681 β 1.63358 β 1.05604 β
β 3 β 3 β 2.05449 β 0.928822 β 1.53021 β 1.66948 β 1.8532 β 1.69901 β
β 4 β 4 β 1.77823 β 1.19143 β 1.36264 β 1.13891 β 1.90184 β 1.46904 β
β 5 β 5 β 1.00121 β 1.91585 β 1.11492 β 1.69419 β 1.58653 β 1.98337 β
This is fine but in my real example I have hundreds of columns so I donβt want to write out
by(
:id,
a_1 = sum(:a_1),
a_2 = sum(:a_2),
a_3 = sum(:a_3),
b_1 = sum(:b_1),
b_2 = sum(:b_2),
b_3 = sum(:b_3),
)
this portion of the code. The column names follow a pattern so I was hoping to be able to generate an Expr
that I could eval
inside of by
but I cannot figure out how to do this.
Something like:
julia> for l in (:a,:b), i in 1:3
println("$(l)_$i = sum(:$(l)_$i),")
end
a_1 = sum(:a_1),
a_2 = sum(:a_2),
a_3 = sum(:a_3),
b_1 = sum(:b_1),
b_2 = sum(:b_2),
b_3 = sum(:b_3),
but creating an Expr
that I can inject into the by
function above and eval
.
Is such a thing possible or is there a better way to do this? Or am I stuck writing it all out?