Type inference of tables /w missing cells

This is a feature suggestion. I’m modeling database tables as a Vector of NamedTuple. I’m wondering if Julia could more smartly infer the type of these structures when values are missing. For starters, I note that Julia is already great at inferring vectors that have missing values; it finds elements of type Union{Missing, T}. Further, Julia is very good at inferring types of vectors of tuples and even vectors of named tuples, so long as they don’t have missing values.

julia> typeof([3, missing])
Array{Union{Missing, Int64},1}

julia> typeof([("A", 3), ("B", 4)])

julia> typeof([(k="A", v=3), (k="B", v=4)])
Array{NamedTuple{(:k, :v),Tuple{String,Int64}},1}

The combination, a vector of tuples /w missing values, falls short.

julia> typeof([("A", 3), ("B", missing)])

julia> typeof([(k="A", v=3), (k="B", v=missing)])
Array{NamedTuple{(:k, :v),T} where T<:Tuple,1}

What I’d love to see instead…

# a future version of Julia
julia> typeof([("A", 3), ("B", missing)])
Array{Tuple{String,Union{Missing, Int64}},1}

julia> typeof([(k="A", v=3), (k="B", v=missing)])
Array{NamedTuple{(:k, :v),Tuple{String,Union{Missing, Int64}},1}

Is this at all related to https://github.com/JuliaLang/julia/issues/24614
or https://github.com/JuliaLang/julia/issues/25925?


For reference, inference isn’t involved at all in the examples posted above. Everything is determined by promote_type.

I have an implementation of deterministic typed tables when the initialization of tables are well typed.

However, something like [("A", 3), ("B", missing)] shouldn’t be treated as Array{Tuple{String,Union{Missing, Int64}},1}, and I think Array{Tuple{String,Any},1} is better in general domains. In fact, without manual annotations, there’re so many prospectively expected types for [("A", 3), ("B", missing)].

What do you mean? The only possible types for elements in that array are Tuple{String,Int} and Tuple{String,Missing}.

If you use LightQuery, you can keep the underlying data stored column-wise as a named tuple of vectors, but use rows to lazily view the data as rows.

    k = ["A", "B"],
    v = [3, missing]
) |>