How to test missing value in CSV package?

package
data

#1

Hi,

I try to test if there is a missing value at a specific position of a dataframe obtained with the CSV package and I did not find the function to do that. Thank you for your help.

julia> x = CSV.read("data/data.csv"; delim = '\t', null="NA")
4×5 DataFrames.DataFrame
│ Row │ id │ s1       │ s2       │ s3       │ s4       │
├─────┼────┼──────────┼──────────┼──────────┼──────────┤
│ 1   │ g1 │ 0.134978 │ 0.231912 │ 0.479582 │ 0.134978 │
│ 2   │ g2 │ 0.972158 │ 0.437821 │ missing  │ 0.848548 │
│ 3   │ g3 │ 0.152925 │ missing  │ 0.848548 │ 0.152925 │
│ 4   │ g4 │ 0.813864 │ 0.972158 │ 0.917429 │ 0.813864 │

julia> x[4][2] == missing
ERROR: UndefVarError: missing not defined

julia> isnan(x[4][2])
missing

julia> isnull(x[4][2])
false

julia> eltype(x[4][2])
Any

julia> isnumber(x[4][2])
ERROR: MethodError: no method matching isnumber(::Missings.Missing)
Closest candidates are:
  isnumber(::Char) at strings/utf8proc.jl:268
  isnumber(::AbstractString) at deprecated.jl:56

#2

You can simply do

ismissing(x[4, 2])  # or
ismissing(x[4, :s2])

#3

Thank you @ExpandingMan but in Julia v0.6.1 this function does not exist. May I have to load another package than CSV ?

julia> ismissing(x[4, 2])
ERROR: UndefVarError: ismissing not defined

#4

You need using Missings.


#5

ismissing ought to be exported from DataFrames. If that’s not happening, you are probably using an out-of-date version of DataFrames. (What do you get when you do Pkg.status("DataFrames")? It should say 0.11.1).

Unfortunately in its current form the package manager is rather problematic, some sometimes some weird thing happens that can cause it to refuse to update. If this is the case, it should at least tell you what’s causing the problem when you do Pkg.update("DataFrames"). In the worst case scenario you can use git to pull the updated version manually.

@nalimilan, right now Missings seems to be re-exported from DataFrames. I do not need using Missings on DataFrames 0.11.1.


#6

Thanks again @ExpandingMan

I did not load DataFrames package to save some precompilation time, but if I load it the function ismissing() is present. So it is strongly advised to load the package DataFrame when using CSV…

julia> using DataFrames

julia> ismissing(x[2, 4])
true

#7

Thanks @nalimilan

Missing package is indeed enough to load ismissing()

julia> using Missings

julia> using CSV

julia> x = CSV.read("data/data.csv"; delim = '\t', null="NA")
4×5 DataFrames.DataFrame
│ Row │ id │ s1       │ s2       │ s3       │ s4       │
├─────┼────┼──────────┼──────────┼──────────┼──────────┤
│ 1   │ g1 │ 0.134978 │ 0.231912 │ 0.479582 │ 0.134978 │
│ 2   │ g2 │ 0.972158 │ 0.437821 │ missing  │ 0.848548 │
│ 3   │ g3 │ 0.152925 │ missing  │ 0.848548 │ 0.152925 │
│ 4   │ g4 │ 0.813864 │ 0.972158 │ 0.917429 │ 0.813864 │

julia> ismissing(x[2, 4])
true

#8

Yes, it’s typically a good idea to have whatever packages that you are planning on working with loaded. Note that DataFrames pre-compiles, so the compilation time should only be noticeable the first time you do using DataFrames. It should load quite fast on all subsequent imports.

Note also that since DataFrames re-exports Missings, if you do using DataFrames you do not need using Missings.