I was reading https://github.com/JuliaComputing/PooledArrays.jl which describes a PooledArray. I tried to follow the source code and looked at the constructor, I gather that a pool is the same as a level in R/Stats speaking which is the same as a category in SAS/stats speak.
In uni stats courses it’s referred to as factor levels, which are basically the unique possibles values in a categorical variable. Also in a pooledarray representation there is a ref which is of integer type RA, it acts as pointers to the list of possible values very much like how R’s factors work.
I think that whenever possible, it is advantageous to separate data structures (“this is what it does”) from applications (“this is what you use it for”), since there may be multiple uses that make sense. This keeps things modular.
PooledArrays’ main purpose is compression, so it doesn’t have anything specific for categorical data. Which package is more appropriate depends on what you need to do.