Which should I use, nothing or NA for DataFrames?

krenova · December 28, 2018, 9:04am

I’m using Julia 1.0.3 and have loaded a csv file but found that the fields are not of the appropriate type.

Therefore, I’ve changed some columns of integer data type into strings, as an example

map(x -> ismissing(x) ? NA : convert(String, x), df[:Column1])

But when I tried to parse strings into Float64, and for the sake of consistency, change the default nothing into NA using

map(x-> (v = tryparse(Float64,x); v == nothing ? NA : v), csv[:recency])

I get the error UndefVarError: NA not defined

However, if I stick to nothing, I feel uncomfortable know that my column is of type: Array{Union{Nothing, Float64},1}. A mix of 2 data types. I fear that the mixture of data types may lead to issues further down in my programme. At the same time, I am unable to change nothing to NA.

Any advice?

Tamas_Papp · December 28, 2018, 9:09am

Neither, use missing:
https://docs.julialang.org/en/v1/manual/missing/

This is unwarranted, using small unions is now supported.

ValdarT · December 28, 2018, 11:29am

There is also a very nice blog post for describing the reasoning behind missing in Julia: First-Class Statistical Missing Values Support in Julia 0.7

krenova · December 30, 2018, 10:06am

Thanks @Tamas_Papp, for direclty answering the question and also addressing the concern about in data type incompatibility.

krenova · December 30, 2018, 10:07am

Thanks ValdarT, this is indeed a good summary. If anyone’s interested, some of the key points are that:

1.missing is analagous to NULL in sql and NA in R
2. missing is similar to its predecessor NA (in Julia)
3. makes it easy to generate sql requests in Julia and interoperate with R

Topic		Replies	Views
Dealing with different concepts of "missingness" New to Julia	13	619	February 10, 2021
How to change the type of a column of a DataFrame General Usage question	9	1429	January 1, 2021
Broadcasting nothing to DataFrame entries raises MethodError New to Julia	5	274	August 15, 2021
Read NULL as missing in dataframe General Usage dataframes , csv	1	972	June 30, 2021
Missing or NaN General Usage	26	12342	August 1, 2018

Which should I use, nothing or NA for DataFrames?

Related topics