Arrow's DictEncode to CategoricalArray?

As I understand it, you are doing something like

julia> using Arrow, CategoricalArrays, DataFrames

julia> df = DataFrame(a = 1:4, b = string.('a':'d'), c = categorical(["x", "x", "y", "y"]))
4×3 DataFrame
 Row │ a      b       c    
     │ Int64  String  Cat… 
─────┼─────────────────────
   1 │     1  a       x
   2 │     2  b       x
   3 │     3  c       y
   4 │     4  d       y

julia> afn = Arrow.write("./df.arrow", df)
"./df.arrow"

julia> df1 = DataFrame(Arrow.Table(afn))
4×3 DataFrame
 Row │ a      b       c      
     │ Int64  String  String 
─────┼───────────────────────
   1 │     1  a       x
   2 │     2  b       x
   3 │     3  c       y
   4 │     4  d       y

julia> typeof(df1.c)
Arrow.DictEncoded{String, Int8, Arrow.List{String, Int32, Vector{UInt8}}}

It won’t be the case that you can “round trip” DataFrame → Arrow → DataFrame and get the same types. Is there a reason that you need a CategoricalArray instead of the Arrow.DictEncoded result. The Arrow.DictEncoded result can in some circumstances take up less storage than the CategoricalArray, because it uses the smallest signed integer type available for the refarray (Int8 in this case).

Arrow.DictEncoded is more like a PooledArray than a CategoricalArray but often the distinctions are not important. They can be important for ordered categorical arrays. I think it is still the case that the Arrow.Table function does ignores whether DictEncoded arrays in the Arrow file have ordered categories.