Nullables - why? and how?

nilshg · December 19, 2017, 4:26pm

Apologies for what might be a daft question, but I’ve lost touch with the developments in the Julia data wrangling ecosystem over the last year or so and found myself a little puzzled when trying to use DataFrames today.

Going though some online resources it seemed to me that CSV.jl is the preferred I/O method for DataFrames these days so I went ahead and did:

using CSV, DataFrames

df = CSV.read("mydata.csv")

After which I found myself with a DataFrame populated by objects of varying interesting types, including Nullable{WeakRefString{UInt8}}, Nullable{Float64}, and others.

I since read up on Nullable.jl and WeakRefString.jl and have some understanding of the motivation behind these types, but I’m still asking myself why this is the default behaviour and how to actually work with this?

Reading through the latest DataFrames.jl docs and the DataFrames section of the Introducing Julia wikibook, I can’t find the Nullable (or WeakRefString) type discussed, and hence struggle to understand what the intended workflow is given that a lot of operations don’t seem to be defined on these types.

amellnik · December 19, 2017, 4:29pm

What version of DataFrames are you using? You may need to update it. Prior to v0.11 you can use readtable to get an older-style DataFrame that uses DataArrays as columns.

Tamas_Papp · December 19, 2017, 4:34pm

You may find this useful:

nilshg · December 19, 2017, 4:49pm

I’m on v0.10.1 (I did Pkg.update(), but it seems that some other packages I have installed require 0.10.1 as per the release announcement linked by Tamas?).

I’ve worked out with the help of other forum posts that I can do weakrefstrings=false and nullable=false to get “normal” data in my DataFrame after reading from csv, but that doesn’t really answer my question. Presumably there’s a reason for why these options are the default behaviour, I’m just struggling to understand the benefits at this point.

Tamas, thanks for linking the release announcement which explains a bit of the background. It does however not mention Nullable (unless there’s a connection between Nullable and NA / missing which I’m missing!) - is there a reason for why this isn’t discussed at all in the docs?

Tamas_Papp · December 19, 2017, 4:54pm

Nullable is the earlier attempt that has been phased out. Search the forums for history.

I would suggest that you start using v0.11.2 of DataFrames, which is much nicer. See the topic I linked for removing packages that hold it at v0.10.1.

nalimilan · December 19, 2017, 6:05pm

…or use readtable until you’re able to move to DataFrames 0.11 (but be careful about reading the docs about the version you are using).

davidanthoff · December 19, 2017, 7:24pm

You can also use https://github.com/davidanthoff/CSVFiles.jl. It will return the “right” missing values in a DataFrame no matter on which version of DataFrames.jl you are. So if you are on DataFrames v0.10 it will use the old style DataArray for missing values, on DataFrames v0.11 it will create a DataFrame that uses the new Missings story. If you load data into some other structure it will also give you the right missing values story for that structure (for example DataValue in an IndexedTable etc.)

Topic		Replies	Views
How to deal with a Nullable DataFrame? General Usage	3	397	June 7, 2019
How to change value of Nullable{WeakRefString{UInt8}} using CSV package? Data data	4	1329	September 26, 2017
Announcement: An Update on DataFrames Future Plans Data announcement	41	9248	December 27, 2017
Data Frames for non null data Data	4	1338	February 23, 2018
Issue with DataFrames, operations on DataFrames now return Nullable Arrays? General Usage	5	1903	July 19, 2017

Nullables - why? and how?

Related topics