I have a dataset that includes a column which has a formula.
The formula is a recipe to get a new column from existing columns.
Essentially the data looks like the following
> df = DataFrame(l=1:8, a=[1, 1, 1, 2, 2, 1, 1, 2],
b=repeat([2, 1], outer=[4]),c=1:8,
formula = [repeat([":b+:c"], 3); repeat([":a+:c"], 5)])
8×5 DataFrame
Row │ l a b c formula
│ Int64 Int64 Int64 Int64 String
─────┼─────────────────────────────────────
1 │ 1 1 2 1 :b+:c
2 │ 2 1 1 2 :b+:c
3 │ 3 1 2 3 :b+:c
4 │ 4 2 1 4 :a+:c
5 │ 5 2 2 5 :a+:c
6 │ 6 1 1 6 :a+:c
7 │ 7 1 2 7 :a+:c
8 │ 8 2 1 8 :a+:c
I am trying to create a new column which would have either the value a b+c
or a+c
as given by the column named formula.
This column is a string and depending on preprocessing looks like a+b
or :a+:b
(could also be :a+:b-:c
).
The result I am looking for is something like:
df = DataFrame(l=1:8, a=[1, 1, 1, 2, 2, 1, 1, 2],
b=repeat([2, 1], outer=[4]),c=1:8,
formula = [repeat([":b+:c"], 3); repeat([":a+:c"], 5)],
value = [3,3,5,6,7,7,8,10])
8×6 DataFrame
Row │ l a b c formula value
│ Int64 Int64 Int64 Int64 String Int64
─────┼────────────────────────────────────────────
1 │ 1 1 2 1 :b+:c 3
2 │ 2 1 1 2 :b+:c 3
3 │ 3 1 2 3 :b+:c 5
4 │ 4 2 1 4 :a+:c 6
5 │ 5 2 2 5 :a+:c 7
6 │ 6 1 1 6 :a+:c 7
7 │ 7 1 2 7 :a+:c 8
8 │ 8 2 1 8 :a+:c 10
I have tried to use metaprogramming after splitting the table groupby(df, :formula)
, but I run into the issue that the scope of eval is global, so the formula and the subdataframes are not properly evaluated.
This is related to this SO post.
Thanks for the help!