# How can I make this DataFramesMeta operation work?

I have a `DataFrame` that has a `PWGTP` column, and then 80 additional columns named `PWGTP1``PWGTP80`. I’m trying to do the following:

``````total = @chain vt18 begin
@rsubset(
:AGEP in 21:64,
:DIS == 1
)
@combine(
:total = sum(:PWGTP),
:variance = (4/80) * sum([(sum(Symbol("PWGTP\$i")) - sum(:PWGTP))^2 for i in 1:80])
)
end
``````

This produces the following error:

``````ERROR: MethodError: no method matching iterate(::Symbol)
``````

I found some old code where I implemented this with JuliaDBMeta like so:

``````variance = (4/80) * sum([(sum(column(_, Symbol("PWGTP\$n"))) - sum(:PWGTP))^2 for n in 1:80]),
``````

I’m struggling to find the DataFramesMeta solution…

Can you use the `names` function to get a vector of the column names that you want to use, then iterate over those?

1 Like

This appears to work:

``````w_cols = vcat("PWGTP", ["PWGTP\$i" for i in 1:80])

@chain vt18 begin
@rsubset(
:AGEP in 21:64,
:DIS == 1
)
@combine(
:total = sum(:PWGTP),
:variance = (4/80) * sum([
(sum(AsTable(w_cols)[i]) - sum(AsTable(w_cols)[1]))^2 for i in 2:80
])
)
end
``````

…I still need to double-check the results though to make sure it’s doing what I think it’s doing…

1. For interpolation, you want to use `\$`, not `Symbol`. See documentation here.
2. In general, this operation isn’t really possible in DataFramesMeta. The issue is that `\$"PWGTP\$i"` can only refer to one column. and `i` has to be defined before the macro is expanded.
``````julia> df = DataFrame(a1 = rand(10), a2 = rand(10), b = rand(10));

julia> @transform df :c = begin
i = 1
\$"a\$i"
end
ERROR: UndefVarError: `i` not defined
``````

Here is one solution that appears to work

``````julia> df = DataFrame(a1 = rand(10), a2 = rand(10), b = rand(10));

julia> function getsum(df)
(newsum = sum(sum(df[!, "a\$i"]) - sum(df[!, "b"]) for i in 1:2),)
end;

julia> @chain df begin
combine(getsum, _)
end
1×1 DataFrame
Row │ newsum
│ Float64
─────┼─────────
1 │ 2.46766
``````

And finally (as you have already written, `AsTable` works well here too.

``````julia> function getsum(nt::NamedTuple)
sum(sum(nt[Symbol("a\$i")]) - sum(nt[Symbol("b")]) for i in 1:2)
end;

julia> cols = [:a1, :a2, :b];

julia> @chain df begin
@combine :newsum = getsum(AsTable(cols))
end
1×1 DataFrame
Row │ newsum
│ Float64
─────┼─────────
1 │ 2.46766
``````

Note: There is a bit of a compile cost when using `AsTable` because it creates a `NamedTuple` (and thus specializes on the length and names of inputs). But 80 isn’t that many columns so it should be fine.

Finally, note that there is an “escape hatch” to work with the whole data frame inside `combine`. In `transform`, if you give just a `function`, on its own, as an argument, the whole `DataFrame` (or `SubDataFrame`, in the case of a grouped operation) is passed. So you can combine both solutions so far

``````julia> @chain df begin
@combine begin
\$(getsum)
:newsum2 = getsum(AsTable(cols))
end
end
1×2 DataFrame
Row │ newsum   newsum2
│ Float64  Float64
─────┼──────────────────
1 │ 2.46766  2.46766
``````
1 Like