How to count the number of categories present in a column of a DataFrame

Hello,

In a dataframe with columns :A, :B, :C where :B contains categorical data, I would like to count the number of categories present in the different subdataframes obtained through groupby(df,:A), and finally getting the list of codes in column :A where more than one category is present.

Thank you very much for any advice,
Philippe

1 Like

Welcome! Can you post a sample dataframe? It’ll be easier to help you then.

Here is an example:
Starting with the following data
df = DataFrame(A = [β€œa”,β€œa”,β€œa”,β€œb”,β€œb”,β€œb”,β€œc”,β€œc”],
B = [β€œX”,β€œY”,β€œZ”,β€œY”,β€œY”,β€œZ”,β€œX”,β€œX”],
C = [2,3,5,2,10,7,5,1])
I would like to count for each value of the A column the number of different values in column B, the result would be something like:
a 3
b 2
c 1
could be tuples, arrays, a dictionary…

Hope this is clearer!
Thanks for your attention.

Try something like

julia> combine(groupby(df, :A), :B=>length∘unique)
3Γ—2 DataFrame
β”‚ Row β”‚ A      β”‚ B_function β”‚
β”‚     β”‚ String β”‚ Int64      β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ a      β”‚ 3          β”‚
β”‚ 2   β”‚ b      β”‚ 2          β”‚
β”‚ 3   β”‚ c      β”‚ 1          β”‚
3 Likes

edit - sorry was thinking of a different API

could also try a countmap(https://joshday.github.io/OnlineStats.jl/stable/api/#OnlineStatsBase.CountMap), or if the data is small call β€œunique” on the requisite column and get the length

Thanks a lot, simple, elegant, efficient!