Representing Nullable Values

StefanKarpinski · July 7, 2017, 6:30pm

From talking with various parties about the Union{T, Null} approach, the overall plan is the following:

Represent nullable data with T? defined as Union{T, Null}.
The representation of small unions is optimized to keep data inline like C unions with extra type indicator bits.
Represent arrays of nullable data simply as Array{T?}.
The representation of arrays of unions is optimized to keep data contiguous inline with extra type indicator bits after the main array data.
For nullable values where null is included in the domain of valid values, use Value{T}? where Value is a simple wrapper type like Ref but immutable.

This is a simple, composable approach and addresses all concerns as far as I can tell. It’s not annoying to use since there’s generally no explict wrapping or unwrapping or “lifting” required. Functions that don’t have methods for null arguments simply raise method errors. Functions that need to handle nulls simply add methods that do the right thing. Recent compiler optimizations make all of this as efficient as the current Nullable approach, if not more so. Moveover, if someone wants to have seven different kinds of nullability for their data, they can do so just by implementing their own null-like types and using unions of them – the compiler optimizations are generic and decoupled from the specifics of the null type. All the scalar optimizations are already implemented and merged on Julia master. I’m not sure about the status of the array representation, but perhaps @quinnj or @jameson can fill us in on the current status.

The primary technical concern @jameson has with this whole approach is that if we represent rows of tables with many nullable columns using covariant named tuples, then there is a potentially exponential number of concrete types of rows in terms of the number of nullable columns, which could – absent compiler changes – lead to an exponential number of code specializations and bad performance due to excessive amounts of dynamic dispatch. However, @jeff.bezanson is confident that we can handle this with better specialization heuristics (we deal with potentially exponential code specializations all the time), and has pointed out that the existing NamedTuples package has the same problem and yet already works on Julia as is. Of course, the existing NamedTuples package isn’t as widely used as built-in named tuples would be.

Topic		Replies	Views
Announcement: An Update on DataFrames Future Plans Data announcement	41	9221	December 27, 2017
Getting our act together in the data ecosystem Data	4	1779	July 4, 2017
DataTables or DataFrames? Data question	32	15307	November 19, 2018
Union type data frame implementation? Data	4	1095	May 25, 2017
Compatibility of Query and Union{T, Missing} Data	3	1737	November 28, 2017

Representing Nullable Values

Related topics