DataFrame how to `groupby` then index with unspecified keys (merge them)

For general Julia collections and for many table types, see group + addmargins from FlexiGroups.jl:

using FlexiGroups

tbl = ...
gm = group(x -> (;x.gender, x.nationality, x.hair_color), tbl) |> addmargins
# these should work - use `total` to select
# the group containing all values of the corresponding parameter:
gm[(; gender = :X, natiaonlity = :US, hair_color=:blue)]
gm[(; gender = :X, natiaonlity = :US, hair_color=total)]
gm[(; gender = total, natiaonlity = :US, hair_color=total)]

For dense multidimensional grouping (as your case seems to be), keyed arrays can be convenient instead of dictionaries (the default). See the docs, FlexiGroups support this as well.

I don’t think DataFrames and FlexiGroups work together though, so the above can only indirectly be applied in your specific case as you start from a DataFrame.

1 Like