Representing Nullable Values

From talking with various parties about the Union{T, Null} approach, the overall plan is the following:

  1. Represent nullable data with T? defined as Union{T, Null}.
  2. The representation of small unions is optimized to keep data inline like C unions with extra type indicator bits.
  3. Represent arrays of nullable data simply as Array{T?}.
  4. The representation of arrays of unions is optimized to keep data contiguous inline with extra type indicator bits after the main array data.
  5. For nullable values where null is included in the domain of valid values, use Value{T}? where Value is a simple wrapper type like Ref but immutable.

This is a simple, composable approach and addresses all concerns as far as I can tell. It’s not annoying to use since there’s generally no explict wrapping or unwrapping or “lifting” required. Functions that don’t have methods for null arguments simply raise method errors. Functions that need to handle nulls simply add methods that do the right thing. Recent compiler optimizations make all of this as efficient as the current Nullable approach, if not more so. Moveover, if someone wants to have seven different kinds of nullability for their data, they can do so just by implementing their own null-like types and using unions of them – the compiler optimizations are generic and decoupled from the specifics of the null type. All the scalar optimizations are already implemented and merged on Julia master. I’m not sure about the status of the array representation, but perhaps @quinnj or @jameson can fill us in on the current status.

The primary technical concern @jameson has with this whole approach is that if we represent rows of tables with many nullable columns using covariant named tuples, then there is a potentially exponential number of concrete types of rows in terms of the number of nullable columns, which could – absent compiler changes – lead to an exponential number of code specializations and bad performance due to excessive amounts of dynamic dispatch. However, @jeff.bezanson is confident that we can handle this with better specialization heuristics (we deal with potentially exponential code specializations all the time), and has pointed out that the existing NamedTuples package has the same problem and yet already works on Julia as is. Of course, the existing NamedTuples package isn’t as widely used as built-in named tuples would be.

13 Likes