In a dataframe with columns :A, :B, :C where :B contains categorical data, I would like to count the number of categories present in the different subdataframes obtained through groupby(df,:A), and finally getting the list of codes in column :A where more than one category is present.
Here is an example:
Starting with the following data
df = DataFrame(A = [βaβ,βaβ,βaβ,βbβ,βbβ,βbβ,βcβ,βcβ],
B = [βXβ,βYβ,βZβ,βYβ,βYβ,βZβ,βXβ,βXβ],
C = [2,3,5,2,10,7,5,1])
I would like to count for each value of the A column the number of different values in column B, the result would be something like:
a 3
b 2
c 1
could be tuples, arrays, a dictionaryβ¦