Missing data and NamedTuple compatibility

nalimilan · January 4, 2018, 10:04am

That’s a serious problem, which has been discussed previously in this issue. Thanks for bringing it up again and proposing solutions! The NullableNamedTuples idea sounds clever. However, I would really like these to work by default with plain NamedTuple. Your post prompted me to start a discussion again with some of the core developers, and it looks like we agree on a solution which involves changing how collect and map compute the element type of the returned array.

The idea is that instead of using the “raw” element type Union{NamedTuple{{:x, :y}, Tuple{Int64, Int64}}, NamedTuple{{:x, :y}, Tuple{Missing, Int64}}, NamedTuple{{:x, :y}, Tuple{Int64, Missing}}, NamedTuple{{:x, :y}, Tuple{Missing, Missing}}}, these functions would detect that this type is a Union of NamedTuple types with different type parameters, and would move the Union to the type parameter themselves, giving NamedTuple{{:x, :y}, Tuple{Union{Int64, Missing}, Union{Int64, Missing}}. Like your NullableNamedTuples approach, a column would only allow for missing values if some missing values are actually present (this is how map works for all types so this cannot really be changed).

This is actually related to the currently open PR 24332, which is essential to get map to work with missing (even apart from issues related to tuples). But it appears it would make sense to go even further and use a mechanism almost identical to promote to choose the element type. The only change compared with promote is that the computed type must always be able to represent all the values exactly, which doesn’t work currently e.g. for promote(1.0, typemax(Int64)). So we agreed that we would need a separate mechanism, tentatively called promote_strict for that. promote could actually automatically fall back on that function, so that most types only need to implement the former.

Help would be welcome to experiment this if somebody is interested.

Topic		Replies	Views
Representing Nullable Values Internals & Design	39	7313	January 20, 2018
Announcement: An Update on DataFrames Future Plans Data announcement	41	9248	December 27, 2017
Missing or NaN General Usage	26	12335	August 1, 2018
Compatibility of Query and Union{T, Missing} Data	3	1737	November 28, 2017
Aliases for Union{T, Nothing} and Union{T, Missing}? New to Julia	40	7288	May 10, 2019

Missing data and NamedTuple compatibility

Related topics