"Nullable" fields: Current recommendation?

In the style guide for 0.6, one is advised to use Nullable{T} rather than Union{Void, T} for fields. In 0.7, Nullable{T} is deprecated, and one is advised to use Union{T, Nothing} or Union{Some{T}, Nothing} instead, where Nothing is the new name for Void. At least superficially, it would seem like these pieces of advice are contradictory?

In the new style guide, the part about Nullable{T} has simply been removed. Does this mean that the current recommendation is to avoid this kind of thing entirely (despite optimizations of small unions)? The text primarily discusses the «union-with-Nothing»-case, while the heading advises against unions in general. What is the impetus/motivation for the recommendation? (Not primarily performance, I assume?)

What I’m after is the «Julian» way of special-casing the leaves in tree structures, or the ends of linked lists, or the like (e.g., for teaching basic algorithms/data structures). It’s certainly possible to indicate such things with «self-pointers», for example – I just sort of assumed that the new use of Union{T, Nothing} rather than Nullable{T} would mean that it’d be a natural fit (for example with a no-op implementation for Nothing of a recursive function on subtrees, or the like).

Or perhaps this use-case is one of the few that don’t fall under the general recommendations of the style guide? (The recommendations do seem to point to other use-cases.)

Any thoughts?

2 Likes

See the latest documentation on Missing Values.
Arrays containing missing values have type Array{Union{Missing,T}, 1}, with T the type of the non-missing values.

I couldn’t find the documentation for it, but it stands to reason that fields whose value may be missing would have type Union{Missing,T}, with T the type of the non-missing value.

1 Like

That wasn’t really my issue :slight_smile: As I see it, Missing is not appropriate in this case, as it’s not a matter of missing data, but of something that isn’t there, so Nothing is the correct type. (For a long discussion of this topic, see, for example, this issue.) The question was whether the use of a union was still frowned upon as fields, now that we’ve moved from Nullables to unions in general.

To summarize/clarify: nothing means that there’s nothing there, while missing means that there’s something there, but you don’t know what it is. The latter is for missing data, etc., and uses three-valued logic, propagatio, and the like, and is a bit like na in R. The former, however, doesn’t really implement much, and corresponds more to the null/NIL/None of other languages – and is what I’m talking about here. (“Null-pointers,” essentially.)

Oh my. One could write a thesis on this. It would be good if someone would sort this out and write the definitive Julia guidance on this topic. For some inspiration see:

1 Like

The DataStructures package has

abstract type LinkedList{T} end

mutable struct Nil{T} <: LinkedList{T}
end

mutable struct Cons{T} <: LinkedList{T}
    head::T
    tail::LinkedList{T}
end

which fits the “Introduce another type that lacks x” clause in the Style Guide.

Nevertheless, it would be nice if the guide explained why unions are to be avoided. Suggesting undefined values instead seems reckless is a recipe for disaster.

Just make it itself as a sentinel for nothing?

2 Likes

Don’t hold your breath :smile: While some general classes of semantics can be established (as in the topics you linked), in practice there are so many minor variations that I imagine discussion and experimentation will be required from time to time.

The great thing about small unions becoming faster is that custom missing types can be introduced for variations on the semantics.

1 Like

Union{T,Nothing} or Union{Some{T},Nothing} is the recommended replacement for Nullable{T}. The section of the style guide “Avoid type Unions in fields” is still right in that you’d better avoid type unions if you can, but it should be updated to mention that it’s OK if you need them and that small type unions are now fast.

We need to document the Union{T,Nothing} pattern somewhere in the manual, but it’s not clear in what section it should go. Nullable used to be documented in the “Types” section, which is clearly not the right place for this.

3 Likes

Ralph_Smith and ChrisRackauckas: Yes, those are certainly two options, with the “[making] it itself as a sentinel for nothing” is what I meant by “self-pointers”. This latter option has the advantage that you can use a single concrete non-union type. I guess the argument for defining your own nil-type is that you can dispatch on it, without considering possible existing implementations for Nothing, perhaps – and using an abstract type means that people can mix in their own node types in your list or tree or whatever. (Not sure if it’ll be more or less efficient than a small union of concrete types, though?)

One of the reasons I’ve been following the discussion leading to Nothing and Missing reasonably closely is that I’ve been wanting a mechanism for this sort of stuff; I’ve been using Nullable so far, and it’s a bit awkward. The alternatives suggested here are fine, of course. For my use-cases it seemed natural to indicate that there is “nothing there,” in the sense that Nothing/nothing is meant to convey – which is why I was wondering if this type was not intended for use in fields at all, or if this was just a legacy issue in the documentation. It seems like it is at least in part the latter (i.e., the old Nullable clause could be reintroduced/modified based on the new recommendation for that specific use-case).

It would be great to add at least a little bit of motivation for the warning against these unions, though, as Ralph_Smith suggests. That might make it easier to determine if the recommendation covers your use-case or not. I.e., if it was originally primarily driven by performance concerns (not implausible, I guess) then it doesn’t really apply at all to small unions now, does it? (I.e., you’d be free to use small unions in general, not just if you “absolutely must” :D) But perhaps there are other, design-related/readability issues as well? (If we eventually get special syntax for Union{T, Nothing}, which has been suggested, using it also for fields does seem quite natural, though.)

Maybe there should be a section on missing values and nothingness, including the main lines of reasoning behind the Missing/Nothing divide (sort of like what jandehaan suggests)? One might not necessarily know which of these one needs, and having them discussed in the same general part of the docs might be useful?

As @jandehaan pointed out, codifying different levels of something being not applicable, not yet defined, immeasurable (+Inf, -Inf), not asked so unknown, asked but answer unknown, derivable from other data, invalid, or other cases is a rather complicated issue.
Julia now can deal directly with a number of these that often get confused in other languages (although not quite as fine grained as HL7’s distinctions), and I think it’s pretty good.
Documentation, with good examples of when you want one or the other, would be great, because it is easy to get confused (even by professionals in areas where it’s important to know the difference - one guy I know thought it was fine to mix the empty or zero-length-string, with undefined and with SQL-NULL, and he was a SQL expert!)
An example, in Spain, most everybody has two last names (based on father’s first last name, and mother’s first last name), but middle names are rare. That causes no end of problems when Spaniards try to deal with English/American web sites that treat no middle name or initial as an error, and balk at two last names, and I always ran into problems in Spain, not having a second last name.
Searching is a pain also, if you don’t correctly distinguish between “No middle name” or “No second last name”, i.e. an empty string, and missing.

I think Julia being able to separately represent things like NaN, “”, missing (SQL null, 3VL), and nothing (programmer’s void, not set, not applicable) is great.

1 Like

I agree whole-heartedly. Very happy with this redesign :slight_smile:

From what I understood, the idea is to simplify some of it with some special syntax (i.e. T? == Union{T, Nothing}, or a ?? b and so on), maybe when that syntax gets introduced would be a good time to update the documentation, so that the user that reads it will already have access to a concise and elegant API, otherwise it’d be annoying for new users to learn Union{T, Nothing} and coalesce and so on and then relearn that there are easier ways. The dedicated syntax would also help make it clear that this is a recommended pattern.

Yep. It’s still changing very quickly, I do hope they get the T?, ??, etc. syntax added before v1.0 for nothing.
coalesce was just changed to be something for nothing, and coalesce for missing.
I think the remaining question is if ? is used for nothing, what short syntax could be used for missing?

Maybe a Spanish question mark ¿? It’s generally fairly easy to type (Option Shift ? on the Mac).

Given that it’s not in the alpha, it doesn’t seem that likely that we’ll get it for this release (0.7/1.0), I guess?

As for syntax/alternate question marks, there’s always the Greek one… xD

See https://github.com/JuliaLang/julia/pull/27382 for small improvements to the docs.

3 Likes