I assume I was one of those “parties”…
I would like to clarify that I think this is the correct plan for return values only, but that it would potentially be a mistake to think that it should be generalized to all usages. I don’t think the use of Array{T?}
should be generalized to appearing as any type parameter. I think that typically arbitrary types should be parameterized by the leaf type of their contents (and not be “tricked” into being made nullable by using a parameter type of T?
). Only the field types should be declared as nullable (since that determines the return type of getfield), although I don’t think this usage should actually come up very often in practice (but for example, it occurs in linked list / tree data structures where the parent / child pointers could be nullable). Along those lines, I that think using NullableArray{T}
may indeed be better than Array{T?}
(but with a much simplified implementation, from being able to use Array{T?}
internally for storage layout and optimization). For all other cases (e.g. other than as a return value), I think the nullable should have to be wrapped into a struct with a nullable field (essentially, DataValues.jl) in order to be lifted. This ensures the user takes responsibility for the handling of the null value (which I think should also address malmaud’s concern).
Additionally, I think with this approach we might even choose not to change eltype
, but instead unify it with nullable-get semantics for dictionaries, such as a get?
or []?
function. I think that this approach may also let us make null-handling very explicit (with the extra ?
in the function name), gaining back even more of malmaud’s forced-acknowledgement (and at the syntax level, for reviewer convenience), without needing the additional overhead of nullable unwrapping.
The primary technical concern…
It’s not primarily just a technical concern of mine. Attempting to treat a NamedTuple field as nullable also violates all of my assertions in the previous paragraph. Since I think null
data should only occur as a function return value and not as a function argument, a nullable-unaware construct such as NamedTuples will require the use of an additional wrapper (e.g. DataValue) that is null-aware.
As a concrete example, for a mapping of (id, name) => (gender, age)
a JuliaDB (a sparse table) might be created to have a type of Table{Columns{IdType, NameType, DataValue{GenderType}, DataValue{AgeType}}}
. Alternatively, the equivalent dataframe (a null-aware table) might have the type Table{IdType, NameType, Columns{GenderType, AgeType}}
. Note that in both of these cases, the nullability is explicitly indicated and handled by a type. I don’t see any reason to make NamedTuple
an exception here. Since constructing it requires calling a function and parameterizing a type, either of those would require that any known-nullable value get wrapped first. Ideally, I think I would like for a null-aware type like DataFrames
to be able to return a null-aware type like NullableTuple{T...}
or NullableNamedTuple{T...}
(with an eltype of T?
) such that the type unwrapping must be acknowledged by the user. If NamedTuple
were implemented to act exactly like a Dict does now (e.g. be indexed like a regular tuple with [:name]
, and iterate as name=>value
pairs), I think using alternative implementations would be as easy as making alternative Dict. Currently, the proposed API is struct-like and not Dict-like, so the proposed usage examples appear to be biased towards exploiting implementation details, rather than the encouraging of experimentation, duck typing, and type-based dispatch for which Julia is justifiably famous . So, no, this is mostly not a technical concern, because we can certain define a mapping of corresponding APIs, such as having get-nullable-field-of-namedtuple
for get?
, and getfield
for getindex
, and nonnull-keys
for keys
. But, um, that sounds very unpleasant to me. By-the-way, have I mentioned yet that I’m in favor of just defining getfield(dict<:Associative, s::Symbol) = dict[s]
(aka dict.s = dict[:s]
, aka implement getfield-overloading) so that we only have one API for this indexing-type operation, but so that the syntax-demanding folks will be satisfied ?