I’m probably missing something very basic in the syntax but if I have a GroupedDataFrame with n columns. (I’ll set n = 4 to illustrate the issue but for the actual use case suppose n >> 4)
gdf = groupby(DataFrame(id=1:3, A=11:13, B=101:103, C = 25:27, D = 32:34), :id)
GroupedDataFrame with 3 groups based on key: id
First Group (1 row): id = 1
Row │ id A B C D
│ Int64 Int64 Int64 Int64 Int64
─────┼───────────────────────────────────
1 │ 1 11 101 25 32
⋮
Last Group (1 row): id = 3
Row │ id A B C D
│ Int64 Int64 Int64 Int64 Int64
─────┼───────────────────────────────────
1 │ 3 13 103 27 34
Suppose fun
runs some calculation on any pair of given columns.
function fun( col1, col2)
return @. (col1 +5) * (col2 +5)
end
What is the best way to run function fun
on all combinations of column [:id]
and columns [:A, :B, :C, :D.]
via the DataFrame mini language. For a single set of two columns I could do:
transform!(gdf, [:id,:A] => ((x,y)->fun(x,y))=> :Aid)
But I’m not sure how to iterate this over the remaining pairs of columns [:id, :B], [:id, C] [: id, :D]
because I’m not sure how to convert type String
into a DataFrame
column
Creating a set
of column names as strings such as set = [ ":A", ":B", ":C", ":D"]
and then looping over clearly won’t work as
for I = set
transform!( gdf, [id,I] =>fun)
end
ERROR: ArgumentError: mixing `Symbol`s with other selectors is not allowed
So I tried
set = Iterators.product(["id"], names(gdf))
But I was unable to figure out how to remove (“id”, “id”) from a Base.Iterators
type. (is this possible?) Instead I used
set = collect(Iterators.product(["id"], names(gdf))
set = filter(x -> x != ("id","id"), set)
however the following syntax results in a MethodError
.
for x = set
transform!(gdf, names(gdf, x) => fun)
end
So I am curious what is the correct way to input the iterator into names
or is there a better approach in general?
As a clumsier option I thought I could name the columns such that
gdf2 = groupby(DataFrame(id1234 = 1:3, A1 = 11:13, B2 = 101:103, C3 = 25:27, D4 = 32:34), :id1234)
then try something like
for I = 1:4
transform!(gdf2, names(gdf2, r"I") => fun)
end
But this also returned a MethodError
Any tips would be greatly appreciated, thanks!