I’ve been going through some of the DataFrames documentation and I feel I haven’t fully grokked the split-apply-combine strategy. Concretely, what I’d like to do is:

- group a DataFrame by one column
- Normalize the data contained in another column within each group
- Recombine the grouped dataframe into one of the same size as the original

Say I have `df = DataFrame(a=[[1,2,3],[4,5,6],[7,8,9],[9,10,11],[10,11,12]],b=[1,1,2,2,3])`

and I’d like to group by values of b, so I do `dfg = groupby(df,:b)`

. So far, so good. What I’d like to do next is: for each vector element x in column a, calculate mean m and standard deviation s over all vector elements within the group and apply the transformation `x -> (x-m)/s`

and finally reconstitute the original dataframe.

For instance, for the group given by b=1, we get `m=mean([1,2,3,4,5,6])=3.5`

and similarly `s = sqrt(3.5)`

which would transform the `[1,2,3]`

vector into `[-1.34 -0.80 -0.27]`

and the vector `[4,5,6]`

into `[0.27 0.80 1.34]`

.

My solution so far is a for loop over the subdataframes of `dfg`

, but I feel like there must be a better way using the split-apply-combine strategy?

Any advice would be much appreciated.