Data Frames for non null data

Jupiterjosh · February 22, 2018, 9:40pm

Is there a “dataframes” for data that you know will not be null? I have used both datatables and dataframes and like their syntactic sugar, but I find myself fussing with nullable arrays or missing.missing issues too much.

I am familiar with the dropna functionality, but is there somewhere where you can set a flag once (on data import for instance) and move on?

tbeason · February 22, 2018, 9:53pm

It isn’t clear to me what you mean. For example, all columns of a DataFrame have their own type, which is essentially Vector{T}. Now, if that column contains all floating point numbers, its type will be Vector{Float64}. If it also contains missing values, the type will be Vector{Union{Float64, Missing}}. If instead of missing it contains NA then the type will be DataVector{Float64}, which is very similar to the previous example. Note that this is all at the column level, not the DataFrame level. Additionally, the type is generally inferred by the compiler, so if no missing or NA values ever appear in the data or in your code, the columns should be correctly typed anyway.

Are you just asking how to convert a column without missing values to a concretely typed object (e.g., Vector{Union{Float64, Missing}} to Vector{Float64})?

jonathanBieler · February 22, 2018, 10:10pm

If you have no missing data in your file you can set an option like nullable=false in CSV.jl when reading your file.

nalimilan · February 22, 2018, 10:29pm

Note that DataFrames used to convert columns to DataArray automatically in versions before 0.11, so make you don’t use old versions. In particular, remove DataTables or DataFrames will remain stuck at 0.10.1. See How to upgrade from DataFrames 0.10.1 to 0.11.3?

Jupiterjosh · February 23, 2018, 8:13pm

Thanks for the great replies.

I ran into these problems after I did a package update for the first time in a few months. I will check out @nalimilan’s posted link to make sure I have dataframes updated properly.

When I read in a file the dataframe is using the Union{Float64,Missing} type even though there is no missing data. Since I observed the behavior even though I had no missing data it looked (to me at least) like this was dataframe’s new behavior. UPDATE: I observed this behavior with readtable not CSV.read(). CSV.read appears to work as described by @tbeason.

When I get back to work on Monday I will try to set the nullable=false flag in CSV to see if that is a good enough hint for the compiler to choose the correct type.

UPDATE 2:
I set the nullable flag in CSV read and everything is working fine.

Thanks again everyone.

Topic		Replies	Views
Assignment of a `missing` value fails in DataFrames 0.11.1 Data	28	4738	June 28, 2018
Nullables - why? and how? New to Julia	6	2454	December 19, 2017
Which should I use, nothing or NA for DataFrames? New to Julia	4	1022	December 30, 2018
How to change the type of a column of a DataFrame General Usage question	9	1410	January 1, 2021
Issue with DataFrames, operations on DataFrames now return Nullable Arrays? General Usage	5	1903	July 19, 2017

Data Frames for non null data

Related topics