Representing Nullable Values

jameson · July 7, 2017, 10:06pm

I assume I was one of those “parties”…

I would like to clarify that I think this is the correct plan for return values only, but that it would potentially be a mistake to think that it should be generalized to all usages. I don’t think the use of Array{T?} should be generalized to appearing as any type parameter. I think that typically arbitrary types should be parameterized by the leaf type of their contents (and not be “tricked” into being made nullable by using a parameter type of T?). Only the field types should be declared as nullable (since that determines the return type of getfield), although I don’t think this usage should actually come up very often in practice (but for example, it occurs in linked list / tree data structures where the parent / child pointers could be nullable). Along those lines, I that think using NullableArray{T} may indeed be better than Array{T?} (but with a much simplified implementation, from being able to use Array{T?} internally for storage layout and optimization). For all other cases (e.g. other than as a return value), I think the nullable should have to be wrapped into a struct with a nullable field (essentially, DataValues.jl) in order to be lifted. This ensures the user takes responsibility for the handling of the null value (which I think should also address malmaud’s concern).

Additionally, I think with this approach we might even choose not to change eltype, but instead unify it with nullable-get semantics for dictionaries, such as a get? or []? function. I think that this approach may also let us make null-handling very explicit (with the extra ? in the function name), gaining back even more of malmaud’s forced-acknowledgement (and at the syntax level, for reviewer convenience), without needing the additional overhead of nullable unwrapping.

The primary technical concern…

It’s not primarily just a technical concern of mine. Attempting to treat a NamedTuple field as nullable also violates all of my assertions in the previous paragraph. Since I think null data should only occur as a function return value and not as a function argument, a nullable-unaware construct such as NamedTuples will require the use of an additional wrapper (e.g. DataValue) that is null-aware.

As a concrete example, for a mapping of (id, name) => (gender, age) a JuliaDB (a sparse table) might be created to have a type of Table{Columns{IdType, NameType, DataValue{GenderType}, DataValue{AgeType}}}. Alternatively, the equivalent dataframe (a null-aware table) might have the type Table{IdType, NameType, Columns{GenderType, AgeType}}. Note that in both of these cases, the nullability is explicitly indicated and handled by a type. I don’t see any reason to make NamedTuple an exception here. Since constructing it requires calling a function and parameterizing a type, either of those would require that any known-nullable value get wrapped first. Ideally, I think I would like for a null-aware type like DataFrames to be able to return a null-aware type like NullableTuple{T...} or NullableNamedTuple{T...} (with an eltype of T?) such that the type unwrapping must be acknowledged by the user. If NamedTuple were implemented to act exactly like a Dict does now (e.g. be indexed like a regular tuple with [:name], and iterate as name=>value pairs), I think using alternative implementations would be as easy as making alternative Dict. Currently, the proposed API is struct-like and not Dict-like, so the proposed usage examples appear to be biased towards exploiting implementation details, rather than the encouraging of experimentation, duck typing, and type-based dispatch for which Julia is justifiably famous . So, no, this is mostly not a technical concern, because we can certain define a mapping of corresponding APIs, such as having get-nullable-field-of-namedtuple for get?, and getfield for getindex, and nonnull-keys for keys. But, um, that sounds very unpleasant to me. By-the-way, have I mentioned yet that I’m in favor of just defining getfield(dict<:Associative, s::Symbol) = dict[s] (aka dict.s = dict[:s], aka implement getfield-overloading) so that we only have one API for this indexing-type operation, but so that the syntax-demanding folks will be satisfied ?

Topic		Replies	Views
Announcement: An Update on DataFrames Future Plans Data announcement	41	9249	December 27, 2017
Missing data and NamedTuple compatibility Internals & Design	92	10641	April 2, 2018
Getting our act together in the data ecosystem Data	4	1788	July 4, 2017
DataTables or DataFrames? Data question	32	15379	November 19, 2018
Missing or NaN General Usage	26	12341	August 1, 2018

Representing Nullable Values

Related topics