Concise encoding of `nothing` in Arrow.jl

Arrow.jl encodes Union{Missing, T} concisely, but Union{Nothing, T} takes more space.

using Arrow

for (label, col) in [
    "Union{Missing,String}" => Union{Missing,String}["a", missing, "b"],
    "Union{Nothing,String}" => Union{Nothing,String}["a", nothing, "b"],
]
    Arrow.write("/tmp/t.arrow", (; x = col))
    t = Arrow.Table("/tmp/t.arrow")
    println("$label  =>  eltype=$(eltype(t.x)), $(filesize("/tmp/t.arrow")) bytes")
end
% jl /tmp/arrow_nulls.jl                             
Union{Missing,String}  =>  eltype=Union{Missing, String}, 474 bytes
Union{Nothing,String}  =>  eltype=Union{Missing, Nothing, String}, 1082 bytes
  • At the moment we get a 3-way Union{Missing, Nothing, String} even though no missing value is present. Is that intended?
  • If I want to use nothing rather than missing in my data, is there a way to get concise encoding other than writing missings and manually converting them to nothings after load?

cc @quinnj