I think I am misunderstanding the column selection in grouped
DataFrames. I would like to select a range of variables in a
combine operation but I cannot make it work with any of the DataAPI selectors:
using DataFrames, Dates, Statistics df = DataFrame( g = ['a','a', 'a', 'a', 'c', 'c', 'c'], date = [Date(2021,1,1), Date(2021,1,2), Date(2021,1,2), Date(2021,1,4), Date(2021,1,1),Date(2021,1,3) ,Date(2021,1,7)], v = rand(7), v1 = rand(7), v2 = rand(7) ) df[:, :week_date] = firstdayofweek.(df.date) gdf = groupby(df, [:g, :week_date]) # Works: cols = [:v, :v1, :v2] combine(gdf, cols .=> mean) combine(gdf, names(gdf)[occursin.(r"^v", names(gdf))] .=> mean) # Does not work: combine(gdf, r"^v" .=> mean) combine(gdf, Between(:v, :v2) .=> mean)
After reading the documentation it does not seem clear to me why there should be a difference. Could someone please clear this up for me.