That’s a serious problem, which has been discussed previously in this issue. Thanks for bringing it up again and proposing solutions! The NullableNamedTuples
idea sounds clever. However, I would really like these to work by default with plain NamedTuple
. Your post prompted me to start a discussion again with some of the core developers, and it looks like we agree on a solution which involves changing how collect
and map
compute the element type of the returned array.
The idea is that instead of using the “raw” element type Union{NamedTuple{{:x, :y}, Tuple{Int64, Int64}}, NamedTuple{{:x, :y}, Tuple{Missing, Int64}}, NamedTuple{{:x, :y}, Tuple{Int64, Missing}}, NamedTuple{{:x, :y}, Tuple{Missing, Missing}}}
, these functions would detect that this type is a Union
of NamedTuple
types with different type parameters, and would move the Union
to the type parameter themselves, giving NamedTuple{{:x, :y}, Tuple{Union{Int64, Missing}, Union{Int64, Missing}}
. Like your NullableNamedTuples
approach, a column would only allow for missing values if some missing values are actually present (this is how map
works for all types so this cannot really be changed).
This is actually related to the currently open PR 24332, which is essential to get map
to work with missing
(even apart from issues related to tuples). But it appears it would make sense to go even further and use a mechanism almost identical to promote
to choose the element type. The only change compared with promote
is that the computed type must always be able to represent all the values exactly, which doesn’t work currently e.g. for promote(1.0, typemax(Int64))
. So we agreed that we would need a separate mechanism, tentatively called promote_strict
for that. promote
could actually automatically fall back on that function, so that most types only need to implement the former.
Help would be welcome to experiment this if somebody is interested.