I have a `DataFrame`

that has a `PWGTP`

column, and then 80 additional columns named `PWGTP1`

…`PWGTP80`

. I’m trying to do the following:

```
total = @chain vt18 begin
@rsubset(
:AGEP in 21:64,
:DIS == 1
)
@combine(
:total = sum(:PWGTP),
:variance = (4/80) * sum([(sum(Symbol("PWGTP$i")) - sum(:PWGTP))^2 for i in 1:80])
)
end
```

This produces the following error:

```
ERROR: MethodError: no method matching iterate(::Symbol)
```

I found some old code where I implemented this with JuliaDBMeta like so:

```
variance = (4/80) * sum([(sum(column(_, Symbol("PWGTP$n"))) - sum(:PWGTP))^2 for n in 1:80]),
```

I’m struggling to find the DataFramesMeta solution…

Can you use the `names`

function to get a vector of the column names that you want to use, then iterate over those?

1 Like

This appears to work:

```
w_cols = vcat("PWGTP", ["PWGTP$i" for i in 1:80])
@chain vt18 begin
@rsubset(
:AGEP in 21:64,
:DIS == 1
)
@combine(
:total = sum(:PWGTP),
:variance = (4/80) * sum([
(sum(AsTable(w_cols)[i]) - sum(AsTable(w_cols)[1]))^2 for i in 2:80
])
)
end
```

…I still need to double-check the results though to make sure it’s doing what I think it’s doing…

- For interpolation, you want to use
`$`

, *not* `Symbol`

. See documentation here.
- In general, this operation isn’t really possible in DataFramesMeta. The issue is that
`$"PWGTP$i"`

can only refer to one column. and `i`

has to be defined before the macro is expanded.

```
julia> df = DataFrame(a1 = rand(10), a2 = rand(10), b = rand(10));
julia> @transform df :c = begin
i = 1
$"a$i"
end
ERROR: UndefVarError: `i` not defined
```

Here is one solution that appears to work

```
julia> df = DataFrame(a1 = rand(10), a2 = rand(10), b = rand(10));
julia> function getsum(df)
(newsum = sum(sum(df[!, "a$i"]) - sum(df[!, "b"]) for i in 1:2),)
end;
julia> @chain df begin
combine(getsum, _)
end
1×1 DataFrame
Row │ newsum
│ Float64
─────┼─────────
1 │ 2.46766
```

And finally (as you have already written, `AsTable`

works well here too.

```
julia> function getsum(nt::NamedTuple)
sum(sum(nt[Symbol("a$i")]) - sum(nt[Symbol("b")]) for i in 1:2)
end;
julia> cols = [:a1, :a2, :b];
julia> @chain df begin
@combine :newsum = getsum(AsTable(cols))
end
1×1 DataFrame
Row │ newsum
│ Float64
─────┼─────────
1 │ 2.46766
```

Note: There is a bit of a compile cost when using `AsTable`

because it creates a `NamedTuple`

(and thus specializes on the length and names of inputs). But 80 isn’t *that* many columns so it should be fine.

Finally, note that there is an “escape hatch” to work with the whole data frame inside `combine`

. In `transform`

, if you give just a `function`

, on its own, as an argument, the whole `DataFrame`

(or `SubDataFrame`

, in the case of a grouped operation) is passed. So you can combine both solutions so far

```
julia> @chain df begin
@combine begin
$(getsum)
:newsum2 = getsum(AsTable(cols))
end
end
1×2 DataFrame
Row │ newsum newsum2
│ Float64 Float64
─────┼──────────────────
1 │ 2.46766 2.46766
```

1 Like