My idea was to take care of that when collecting. When the iterable of NullableNamedTuples (a better name is needed…) is collected into say a DataFrame, the columns that do not have any missings
are converted to regular Vector{T}
whereas the others stay as Vector{Union{T, Missing}}
. This conversion could maybe be possible without copying, though I’m not sure about that.
Inside the query, it is true that all columns would accept Missings, but that shouldn’t be a concern: a key advantage of the Missing approach (over a container approach) is that the code doesn’t need to be changed if some column allows missing data if there actually is no missing data (whereas DataValue
would require the occasional get
as soon as it encounters a function that is not “whitelisted”).
One might argue that this solution is still not ideal as in the final output some columns that “should be nullable” in the sense that they are a function of nullable columns would not be nullable if there is no missing data in the input. I’m not sure whether this is a problem in practice (though I don’t think it should be). I also don’t think there is a way to avoid this behavior with a Union
approach to missing data without relying on Base._return_type
.
What drove my curiosity was this comment about JuliaDB. I was trying to understand what solution JuliaDB’s developers had in mind as it seems to me that the same issues that affect Query would also affect JuliaDB (JuliaDB’s map
being very similar to Query’s @map
).