How to check if a column is categorical in dataframes

How to check if a dataframw column is categorical?
│ Row │ A │ B │
│ │ Cat… │ String │
├─────┼──────┼────────┤
│ 1 │ A │ X │
│ 2 │ B │ X │
│ 3 │ C │ X │
│ 4 │ D │ Y │
│ 5 │ D │ Y │
│ 6 │ A │ Y │

lets say in the above dataframe the column A is categorical.

Now if I do eltype(df[!, A]) the its returns me

categoricalValue{string, UInt8}

But the eltype can also return categoricalValue{Int}

What would be the best possible way to put a check to see if a column is categorical or not.

I would think that checking whether the column is a CategoricalArray would be most direct:

julia> df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
4×2 DataFrame
│ Row │ A     │ B      │
│     │ Int64 │ String │
├─────┼───────┼────────┤
│ 1   │ 1     │ M      │
│ 2   │ 2     │ F      │
│ 3   │ 3     │ F      │
│ 4   │ 4     │ M      │

julia> typeof(df.B)
Array{String,1}

julia> categorical!(df, :B)
4×2 DataFrame
│ Row │ A     │ B    │
│     │ Int64 │ Cat… │
├─────┼───────┼──────┤
│ 1   │ 1     │ M    │
│ 2   │ 2     │ F    │
│ 3   │ 3     │ F    │
│ 4   │ 4     │ M    │

julia> typeof(df.B)
CategoricalArray{String,1,UInt32,String,CategoricalValue{String,UInt32},Union{}}

julia> isa(df.B, CategoricalArray)
true
3 Likes

describe(df) will also show this information

1 Like

This works. All this time I was confused about how to check the “CategoricalValue” type. But checking if it’s a categorical array is indeed the straightforward approach.

Thanks @pdeffebach and @hendri54.

Is there a way to check if it is ordered?

As (almost) always, we can learn a lot from looking at the documentation:

https://categoricalarrays.juliadata.org/stable/apiindex/#CategoricalArrays.isordered-Tuple{CategoricalArray}

isordered(A::CategoricalArray)

Test whether entries in A can be compared using <, > and similar operators, using the ordering of levels.