Representing Nullable Values

quinnj · July 7, 2017, 2:01pm

The title of the post is “An Update on DataFrames Future Plans”, and the mentioned approach of using the Nulls.jl package does indeed affect the future of a number of other related packages, so I’d say it’s fair to characterize this as a roadmap for data-related ecosystem packages; though I’d also point out it’s at least as over-generalized as naming a package DataVerse.jl

It’s unfortunate that it seems you’re backtracking from our discussions at JuliaCon where, at least IMO, we were able to all come together with the right core devs to discuss potential blocking issues and come up with plans to resolve. I understand there is still work to do to, but at least in my mind, we had a consensus about the advantages of Union{T, Null} and agreed to move forward with it.

It’s also worth re-iterating that this isn’t as much an “experimentation” as much as an optimization exercise. Indeed, DataFrames/DataArrays have already been using the Union{T, NAtype} approach for what, 4 or 5 years now? Indeed, creating the Nulls.jl package basically consisted of splitting the NAtype out of the DataArrays.jl package. The “experimentation” of porting packages over (which in most cases is really porting the packages back to a union approach) has also advanced quite far, including corresponding branches across a number of repos that now use Union{T, Null}, without running into any blocking issues. I understand that you voiced concerns that there will be issues with Query.jl using this approach, but I also don’t think I’ve seen any attempt to port code over, which would help pinpoint exact issues and which I’ve mentioned I’m happy to help with. My point is that while the specific implementation of Query.jl may indeed need additional language improvements, there are plenty of other packages/workflows/implementations of similar data processing type code that have been or will be fully functional without any additional language improvements needed (functionally at least; of course any package desires performance improvements).

That’s not even mentioning the numerous open issues about Nullables/DataValues type approach (which I also discussed in detail during my JuliaCon talk). Indeed, I’d argue there are as many open technical questions/issues with using Nullables/DataValues for data analysis as there are for Union{T, Null}. There are certainly reasons that a number of data-related packages ported completely over to using the Nullables-based approach and have now decided to move to Union{T, Null}.

At the end of the day, perhaps the implementation that Query.jl has taken is better suited to Nullables/DataValues, and the approaches taken by other data-related packages are better suited to Union{T, Null}, so perhaps it’d be more useful to explore ways to ensure all these packages continue to interop using either approach.

Topic		Replies	Views
Announcement: An Update on DataFrames Future Plans Data announcement	41	9247	December 27, 2017
Missing data and NamedTuple compatibility Internals & Design	92	10632	April 2, 2018
Getting our act together in the data ecosystem Data	4	1787	July 4, 2017
DataTables or DataFrames? Data question	32	15374	November 19, 2018
Missing or NaN General Usage	26	12327	August 1, 2018

Representing Nullable Values

Related topics