Grouping a DataFrame by something other than an existing column

data

#1

Is it possible to group a DataFrame by something other than an existing column? Yes, I could add a column to the DataFrame that contains the computed values and then group on that new column. But this may clutter the DataFrame with columns that I use only once. Here’s some pseudo-syntax for what I would like to do if I were using the RDataset iris dataset and wanted to compute the mean petal width for irises depending on whether their sepal length was greater than 5.

 by(iris,iris[:SepalLength].>5.,df->mean(df[:PetalWidth]))

I have the sense that there has to be easy way to do this and I am just missing something.

Obviously I could do this

 iris[big_length]=iris[:SepalLength].>5.
 by(iris,:big_length,df->mean(df[:PetalWidth]))

But I am trying to avoid adding a new column to iris.


#2

You can do this with this Query.jl query:

df = @from i in iris begin
    @group i.PetalWidth by i.SepalLength > 5 into g
    @select {big_length = g.key, PetalWidth = mean(g)}
    @collect DataFrame
end

#3

Edit: nevermind, a new variable is indeed required…

Or LazyQuery:

using RDatasets
using LazyQuery

@new_environment

@chain @evaluate begin
    using LazyQuery
    using RDatasets

    dataset("datasets", "iris")
    @add_to it BigLength = ~SepalLength > 5
    @group it by BigLength
    @make_from it PetalWidth = mean(PetalWidth)
    @ungroup it
end