Yes I recall @bkamins answering this before as well, unfortunately the discourse search is hopeless so I can’t find the thread (or maybe it was just on slack)
Then if you find a case when doing grouping several times on different columns is inconvenient please let me know and we will think if we can add what you ask for (note though that it will not be fast as it will have O(number of groups) cost, as opposed to O(1) cost of current lookup).
julia> function regroup(gd; kwargs...)
omitted_keys = setdiff(groupcols(gd), keys(kwargs))
new_keys = Any[]
for k in keys(gd)
res = false
for c in keys(kwargs)
if k[c] == kwargs[c]
push!(new_keys, k)
end
end
end
gd[new_keys]
end
For general Julia collections and for many table types, see group + addmargins from FlexiGroups.jl:
using FlexiGroups
tbl = ...
gm = group(x -> (;x.gender, x.nationality, x.hair_color), tbl) |> addmargins
# these should work - use `total` to select
# the group containing all values of the corresponding parameter:
gm[(; gender = :X, natiaonlity = :US, hair_color=:blue)]
gm[(; gender = :X, natiaonlity = :US, hair_color=total)]
gm[(; gender = total, natiaonlity = :US, hair_color=total)]
For dense multidimensional grouping (as your case seems to be), keyed arrays can be convenient instead of dictionaries (the default). See the docs, FlexiGroups support this as well.
I don’t think DataFrames and FlexiGroups work together though, so the above can only indirectly be applied in your specific case as you start from a DataFrame.