Missing data and NamedTuple compatibility

piever · January 4, 2018, 3:57am

My idea was to take care of that when collecting. When the iterable of NullableNamedTuples (a better name is needed…) is collected into say a DataFrame, the columns that do not have any missings are converted to regular Vector{T} whereas the others stay as Vector{Union{T, Missing}}. This conversion could maybe be possible without copying, though I’m not sure about that.

Inside the query, it is true that all columns would accept Missings, but that shouldn’t be a concern: a key advantage of the Missing approach (over a container approach) is that the code doesn’t need to be changed if some column allows missing data if there actually is no missing data (whereas DataValue would require the occasional get as soon as it encounters a function that is not “whitelisted”).

One might argue that this solution is still not ideal as in the final output some columns that “should be nullable” in the sense that they are a function of nullable columns would not be nullable if there is no missing data in the input. I’m not sure whether this is a problem in practice (though I don’t think it should be). I also don’t think there is a way to avoid this behavior with a Union approach to missing data without relying on Base._return_type.

What drove my curiosity was this comment about JuliaDB. I was trying to understand what solution JuliaDB’s developers had in mind as it seems to me that the same issues that affect Query would also affect JuliaDB (JuliaDB’s map being very similar to Query’s @map).

Topic		Replies	Views
Representing Nullable Values Internals & Design	39	7313	January 20, 2018
Announcement: An Update on DataFrames Future Plans Data announcement	41	9248	December 27, 2017
Missing or NaN General Usage	26	12335	August 1, 2018
Compatibility of Query and Union{T, Missing} Data	3	1737	November 28, 2017
Aliases for Union{T, Nothing} and Union{T, Missing}? New to Julia	40	7296	May 10, 2019

Missing data and NamedTuple compatibility

Related topics