Add metadata to categorical array

Dataframes.jl has a nice feature for adding metadata to a table:
Metadata · DataFrames.jl (juliadata.org)

This is useful because often you want to use shorter variable names when coding but you want to be able to conveniently look up what the represents (or print more verbose variable names or descriptions when creating a table).

Is there something similar for CategoricalArrays.jl? That is, if I want to use relatively short names in the levels of the categorical array, but also want longer names or descriptions for each of the categorical values, is there a package that allows me to conveniently add this metadata to my categorical arrays and conveniently exploit it?

Thanks!

1 Like

There was some work on it here GitHub - JuliaArrays/MetadataArrays.jl but I do not know the current status.

This is the kind of feature which could make sense to have in CategoricalArrays, and I’ve thought about it in the past. But that would also make the package more complex so I’m somewhat hesitant. Storing a set of descriptions is easy, but you’d have to handle the (default) case where no descriptions are provided, decide what to do when concatenating two pools with identical levels but different descriptions, etc.

This discussion is similar to the one we had about supporting the same features as LabelledArray (which I haven’t done in the end): `LabeledArray` and `CategoricalArray` · Issue #4 · junyuan-chen/ReadStatTables.jl · GitHub

1 Like

The MetadataArrays approach does look cleaner and more composable: just as CategoricalArrays, other arrays can also be useful to equip with metadata. The only (but significant) friction point here is the general “multiple nested wrappers” problem.

1 Like