Creating a new column containing DataFrames itself (e.g. from "complex" function output)

There are many questions, so let me try showing what I think you want (if I missed something please comment).

Variant 1: get for each variable and for each group a data frame with the result:

julia> combine(groupby(df, :gr), names(df, Number) .=> (x -> Ref(DataFrame(q=0.0:0.1:1.0, v=quantile(x, 0.0:0.1:1.0)))) => x -> x * "_DataFrame")
4×4 DataFrame
 Row │ gr    x1_DataFrame    x2_DataFrame    x3_DataFrame
     │ Char  DataFrame       DataFrame       DataFrame
─────┼──────────────────────────────────────────────────────
   1 │ A     11×2 DataFrame  11×2 DataFrame  11×2 DataFrame
   2 │ B     11×2 DataFrame  11×2 DataFrame  11×2 DataFrame
   3 │ C     11×2 DataFrame  11×2 DataFrame  11×2 DataFrame
   4 │ D     11×2 DataFrame  11×2 DataFrame  11×2 DataFrame

(instead of Ref you could wrap with [...] also, but Ref is a standard way in Base Julia broadcasting of turning any value into a scalar, so it is easier to remember)

Variant 2: expand the data frames into columns but still keeping the number of rows equal to number of groups:

julia> combine(groupby(df, :gr), names(df, Number) .=> (x -> Ref(DataFrame(q=0.0:0.1:1.0, v=quantile(x, 0.0:0.1:1.0)))) => x -> x .* ["_q", "_v"])
4×7 DataFrame
 Row │ gr    x1_q                               x1_v                               x2_q                               x2_v                               x3_q            ⋯
     │ Char  Array…                             Array…                             Array…                             Array…                             Array…          ⋯
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ A     [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0…  [0.0100206, 0.105031, 0.146141, …  [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0…  [0.026531, 0.106821, 0.25079, 0.…  [0.0, 0.1, 0.2, ⋯
   2 │ B     [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0…  [0.0201708, 0.14565, 0.178781, 0…  [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0…  [0.0379883, 0.0499197, 0.163857,…  [0.0, 0.1, 0.2,
   3 │ C     [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0…  [0.0320945, 0.12889, 0.188354, 0…  [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0…  [0.0560529, 0.166095, 0.221287, …  [0.0, 0.1, 0.2,
   4 │ D     [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0…  [0.0154812, 0.046749, 0.167502, …  [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0…  [0.0325466, 0.15281, 0.280586, 0…  [0.0, 0.1, 0.2,

Variant 3: as variant 2, but expand to as many rows as quantiles (for each variable keep a separate quantile column as in general it could be different)

julia> combine(groupby(df, :gr), names(df, Number) .=> (x -> DataFrame(q=0.0:0.1:1.0, v=quantile(x, 0.0:0.1:1.0))) => x -> x .* ["_q", "_v"])
44×7 DataFrame
 Row │ gr    x1_q     x1_v       x2_q     x2_v       x3_q     x3_v
     │ Char  Float64  Float64    Float64  Float64    Float64  Float64
─────┼──────────────────────────────────────────────────────────────────
   1 │ A         0.0  0.0100206      0.0  0.026531       0.0  0.0304922
   2 │ A         0.1  0.105031       0.1  0.106821       0.1  0.116344
   3 │ A         0.2  0.146141       0.2  0.25079        0.2  0.160595
   4 │ A         0.3  0.239598       0.3  0.275699       0.3  0.223479
   5 │ A         0.4  0.418623       0.4  0.391514       0.4  0.283464
   6 │ A         0.5  0.479909       0.5  0.463614       0.5  0.350202
   7 │ A         0.6  0.661491       0.6  0.478091       0.6  0.421991
   8 │ A         0.7  0.709587       0.7  0.626841       0.7  0.464356
   9 │ A         0.8  0.778766       0.8  0.721748       0.8  0.581408
  10 │ A         0.9  0.922159       0.9  0.941598       0.9  0.762324
  11 │ A         1.0  0.986275       1.0  0.995137       1.0  0.923933
  12 │ B         0.0  0.0201708      0.0  0.0379883      0.0  0.0213256
  13 │ B         0.1  0.14565        0.1  0.0499197      0.1  0.163012
  ⋮  │  ⋮       ⋮         ⋮         ⋮         ⋮         ⋮         ⋮
  33 │ C         1.0  0.976539       1.0  0.902687       1.0  0.838742
  34 │ D         0.0  0.0154812      0.0  0.0325466      0.0  0.0327547
  35 │ D         0.1  0.046749       0.1  0.15281        0.1  0.187093
  36 │ D         0.2  0.167502       0.2  0.280586       0.2  0.31834
  37 │ D         0.3  0.236399       0.3  0.328495       0.3  0.389418
  38 │ D         0.4  0.31478        0.4  0.452312       0.4  0.428514
  39 │ D         0.5  0.325001       0.5  0.527466       0.5  0.526287
  40 │ D         0.6  0.448259       0.6  0.591354       0.6  0.546437
  41 │ D         0.7  0.573627       0.7  0.672257       0.7  0.613559
  42 │ D         0.8  0.802225       0.8  0.720313       0.8  0.730664
  43 │ D         0.9  0.893947       0.9  0.890409       0.9  0.908996
  44 │ D         1.0  0.931859       1.0  0.949966       1.0  0.977479
                                                         19 rows omitted

Variant 4: as variant 3, but single quantile column

julia> combine(groupby(df, :gr), names(df, Number) .=> (x -> quantile(x, 0.0:0.1:1.0)) => x -> x .* "_v", Returns((q=0.0:0.1:1.0,)))
44×5 DataFrame
 Row │ gr    x1_v       x2_v       x3_v       q
     │ Char  Float64    Float64    Float64    Float64
─────┼────────────────────────────────────────────────
   1 │ A     0.0100206  0.026531   0.0304922      0.0
   2 │ A     0.105031   0.106821   0.116344       0.1
   3 │ A     0.146141   0.25079    0.160595       0.2
   4 │ A     0.239598   0.275699   0.223479       0.3
   5 │ A     0.418623   0.391514   0.283464       0.4
   6 │ A     0.479909   0.463614   0.350202       0.5
   7 │ A     0.661491   0.478091   0.421991       0.6
   8 │ A     0.709587   0.626841   0.464356       0.7
   9 │ A     0.778766   0.721748   0.581408       0.8
  10 │ A     0.922159   0.941598   0.762324       0.9
  11 │ A     0.986275   0.995137   0.923933       1.0
  12 │ B     0.0201708  0.0379883  0.0213256      0.0
  13 │ B     0.14565    0.0499197  0.163012       0.1
  ⋮  │  ⋮        ⋮          ⋮          ⋮         ⋮
  33 │ C     0.976539   0.902687   0.838742       1.0
  34 │ D     0.0154812  0.0325466  0.0327547      0.0
  35 │ D     0.046749   0.15281    0.187093       0.1
  36 │ D     0.167502   0.280586   0.31834        0.2
  37 │ D     0.236399   0.328495   0.389418       0.3
  38 │ D     0.31478    0.452312   0.428514       0.4
  39 │ D     0.325001   0.527466   0.526287       0.5
  40 │ D     0.448259   0.591354   0.546437       0.6
  41 │ D     0.573627   0.672257   0.613559       0.7
  42 │ D     0.802225   0.720313   0.730664       0.8
  43 │ D     0.893947   0.890409   0.908996       0.9
  44 │ D     0.931859   0.949966   0.977479       1.0
                                       19 rows omitted

(here note the way to return a constant column not depending on anything - you create a named tuple with the column name you want and just wrap it in Returns.

1 Like