How to count the number of categories present in a column of a DataFrame

philc · January 11, 2020, 5:24pm

Hello,

In a dataframe with columns :A, :B, :C where :B contains categorical data, I would like to count the number of categories present in the different subdataframes obtained through groupby(df,:A), and finally getting the list of codes in column :A where more than one category is present.

Thank you very much for any advice,
Philippe

bashonubuntu · January 11, 2020, 6:16pm

Welcome! Can you post a sample dataframe? It’ll be easier to help you then.

philc · January 11, 2020, 7:38pm

Here is an example:
Starting with the following data
df = DataFrame(A = [“a”,“a”,“a”,“b”,“b”,“b”,“c”,“c”],
B = [“X”,“Y”,“Z”,“Y”,“Y”,“Z”,“X”,“X”],
C = [2,3,5,2,10,7,5,1])
I would like to count for each value of the A column the number of different values in column B, the result would be something like:
a 3
b 2
c 1
could be tuples, arrays, a dictionary…

Hope this is clearer!
Thanks for your attention.

dawbarton · January 11, 2020, 7:55pm

Try something like

julia> combine(groupby(df, :A), :B=>length∘unique)
3×2 DataFrame
│ Row │ A      │ B_function │
│     │ String │ Int64      │
├─────┼────────┼────────────┤
│ 1   │ a      │ 3          │
│ 2   │ b      │ 2          │
│ 3   │ c      │ 1          │

anon92994695 · January 11, 2020, 9:09pm

edit - sorry was thinking of a different API

could also try a countmap(API · OnlineStats Docs), or if the data is small call “unique” on the requisite column and get the length

philc · January 13, 2020, 4:55pm

Thanks a lot, simple, elegant, efficient!

merlin · July 28, 2021, 10:20pm

this is beautiful

akshdfyehd · September 18, 2022, 3:03pm

Hi, what is the dot between length and unique? how to type it?
Thanks

nilshg · September 18, 2022, 4:01pm

Function composition, type it with \circ<tab>. You can always copy paste a unicode symbol into the REPL help mode to find out how to type it.

Topic		Replies	Views
Counts of unique values per group in a DataFrame Data question , dataframes	3	10201	May 25, 2020
Counting in dataframes Data dataframes	7	1583	June 7, 2023
Compute frequency or proportions on grouped dataframes Data dataframes	10	1162	June 14, 2021
Pandas value_counts() equivalent in base Julia or in Data frame package? General Usage	1	246	January 27, 2023
Efficiently finding the frequency of patterns in DataFrame columns New to Julia dataframes , dictionaries , splitapplycombine	12	1551	January 1, 2022

How to count the number of categories present in a column of a DataFrame

Related topics