What is a Pooled Array? And my answer

I was reading https://github.com/JuliaComputing/PooledArrays.jl which describes a PooledArray. I tried to follow the source code and looked at the constructor, I gather that a pool is the same as a level in R/Stats speaking which is the same as a category in SAS/stats speak.

In uni stats courses it’s referred to as factor levels, which are basically the unique possibles values in a categorical variable. Also in a pooledarray representation there is a ref which is of integer type RA, it acts as pointers to the list of possible values very much like how R’s factors work.

That (ordered levels) is one application, but the other main feature is simply compression. For unordered levels, see
https://github.com/JuliaArrays/IndirectArrays.jl

I think that whenever possible, it is advantageous to separate data structures (“this is what it does”) from applications (“this is what you use it for”), since there may be multiple uses that make sense. This keeps things modular.

1 Like

If you’re looking for an equivalent of R factors, rather use CategoricalArrays:
https://github.com/JuliaData/CategoricalArrays.jl

PooledArrays’ main purpose is compression, so it doesn’t have anything specific for categorical data. Which package is more appropriate depends on what you need to do.

1 Like

Is the distinction between those three packages documented somewhere?

Not that I know of.

1 Like