Iβm trying to understand how to do data manipulations in DataFrames (+ Query or + DataFramesMeta or +LINQ), in particular today I am looking to get rownumber in a groupby query
For MWE consider
df = DataFrame(a=repeat([1,2,3], inner=[2]), b=randn(6))
Lets say I want to group by a
, get mean of b
within a
, and return something like:
df |>
@groupby(_.a) |>
@map({rows=_, m=mean(_.b), t=1:length(_)}) |>
@mapmany(_.t, {r=__, getindex(_.rows,__).a, getindex(_.rows,__).b, _.m}) |>
DataFrame
6Γ4 DataFrame
Row β r a b measure
β Int64 Int64 Float64 Float64
ββββββΌββββββββββββββββββββββββββββββββββββ
1 β 1 1 0.137441 -0.347454
2 β 2 1 -0.832349 -0.347454
3 β 1 2 0.775431 0.68382
4 β 2 2 0.592209 0.68382
5 β 1 3 -1.17639 -0.324448
6 β 2 3 0.527495 -0.324448
Is there a better way to do this? in particular, my real world example I have too many variables for this getindex
approach to work.
In R I would do something like this
> data.frame(a=rep(1:3,each=2), b=rnorm(2*3)) %>% group_by(a) %>% mutate(m=mean(b), r=row_number())
# A tibble: 6 x 4
# Groups: a [3]
a b m r
<int> <dbl> <dbl> <int>
1 1 0.474 0.391 1
2 1 0.307 0.391 2
3 2 0.500 -0.370 1
4 2 -1.24 -0.370 2
5 3 -1.33 -0.862 1
6 3 -0.398 -0.862 2
The use of _
and __
is almost ineligible to me.