Normalizing DataFrame column by group

What you ask for is typically called standardization.

Is this what you want?

julia> df = DataFrame(a=[[1,2,3],[4,5,6],[7,8,9],[9,10,11],[10,11,12]],b=[1,1,2,2,3])
5×2 DataFrame
 Row │ a             b
     │ Array…        Int64
─────┼─────────────────────
   1 │ [1, 2, 3]         1
   2 │ [4, 5, 6]         1
   3 │ [7, 8, 9]         2
   4 │ [9, 10, 11]       2
   5 │ [10, 11, 12]      3

julia> function standardize(vx)
           m = mean(Iterators.flatten(vx))
           s = std(Iterators.flatten(vx))
           return [(v .- m) ./ s for v in vx]
       end
standardize (generic function with 1 method)

julia> combine(groupby(df, :b), :a => standardize)
5×2 DataFrame
 Row │ b      a_standardize
     │ Int64  Array…
─────┼─────────────────────────────────────────
   1 │     1  [-1.33631, -0.801784, -0.267261]
   2 │     1  [0.267261, 0.801784, 1.33631]
   3 │     2  [-1.41421, -0.707107, 0.0]
   4 │     2  [0.0, 0.707107, 1.41421]
   5 │     3  [-1.0, 0.0, 1.0]
1 Like