I think I am misunderstanding the column selection in grouped DataFrame
s. I would like to select a range of variables in a combine
operation but I cannot make it work with any of the DataAPI selectors:
using DataFrames, Dates, Statistics
df = DataFrame(
g = ['a','a', 'a', 'a', 'c', 'c', 'c'],
date = [Date(2021,1,1), Date(2021,1,2), Date(2021,1,2), Date(2021,1,4), Date(2021,1,1),Date(2021,1,3) ,Date(2021,1,7)],
v = rand(7),
v1 = rand(7),
v2 = rand(7)
)
df[:, :week_date] = firstdayofweek.(df.date)
gdf = groupby(df, [:g, :week_date])
# Works:
cols = [:v, :v1, :v2]
combine(gdf, cols .=> mean)
combine(gdf, names(gdf)[occursin.(r"^v", names(gdf))] .=> mean)
# Does not work:
combine(gdf, r"^v" .=> mean)
combine(gdf, Between(:v, :v2) .=> mean)
After reading the documentation it does not seem clear to me why there should be a difference. Could someone please clear this up for me.
Thanks!