Normalized nrow in a GroupedDataFrame

What is the best way to do a “normalized” nrow in a gdf?

using DataFrames
using Chain
df = DataFrame(id=1:6,
                      name=["Aaron Aardvark", "Belen Barboza",
                              "春 陈", "Даниил Дубов",
                              "Elżbieta Elbląg", "Felipe Fittipaldi"],
                      age=[50, 45, 40, 35, 30, 25],
                      eye=["blue", "brown", "hazel", "blue", "green", "brown"],
                      grade_1=[95, 90, 85, 90, 95, 90],
                      grade_2=[75, 80, 65, 90, 75, 95],
                      grade_3=[85, 85, 90, 85, 80, 85])
@chain df begin
    groupby(:eye)
    combine(nrow => :n,  x -> nrow(x) / nrow(df))
end

That outputs:

4×3 DataFrame
 Row │ eye     n      x1       
     │ String  Int64  Float64  
─────┼─────────────────────────
   1 │ blue        2  0.333333
   2 │ brown       2  0.333333
   3 │ hazel       1  0.166667
   4 │ green       1  0.166667

But it i try to rename the x1 column I get a strange thing:

 @chain df begin
           groupby(:eye)
           combine(nrow => :n,  x -> nrow(x) / nrow(df) => :perc)
       end
4×3 DataFrame
 Row │ eye     n      x1              
     │ String  Int64  Pair…           
─────┼────────────────────────────────
   1 │ blue        2  0.333333=>:perc
   2 │ brown       2  0.333333=>:perc
   3 │ hazel       1  0.166667=>:perc
   4 │ green       1  0.166667=>:perc

That’s operatory precedence for you:

julia> @chain df begin
           groupby(:eye)
           combine(nrow => :n,  :name => (x -> length(x) / nrow(df)) => :perc)
       end
4×3 DataFrame
 Row │ eye     n      perc     
     │ String  Int64  Float64  
─────┼─────────────────────────
   1 │ blue        2  0.333333
   2 │ brown       2  0.333333
   3 │ hazel       1  0.166667
   4 │ green       1  0.166667

(Note the brackets around the anonymous function)

2 Likes

Thank you!

Why does the first case work? I don’t see anything in the docs for passing just a function like that, only cols => function or cols => function => newcols. And why is that column called x1?

julia> @chain df begin
           groupby(:eye)
           combine(y -> 1)
       end
4×2 DataFrame
 Row │ eye     x1    
     │ String  Int64 
─────┼───────────────
   1 │ blue        1
   2 │ brown       1
   3 │ hazel       1
   4 │ green       1

julia> @chain df begin
           groupby(:eye)
           combine(y -> 1, x -> 2)
       end
ERROR: ArgumentError: duplicate output column name: :x1

It’s list item 7 here. You can pass a function which accepts a SubDataFrame. But I guess it doesn’t generate names perfectly so you get an error where it tries to make :x1 twice.