Detecting missing in DataFrame columns

I am still trying to get used to some of the syntax so I apologize in advance if this is a simple answer. I have a couple of columns in a DataFrame, some of which have a missing value and some columns do not. I am looking for a way to detect which columns have any missing value and then a way to see if any cells across the DataFrame have a single missing value.

Here is the code I have, I tried the two lines at the end and both come back all false even when there are missing values:

using Pkg
using DataFrames

df = DataFrame([5:6, [1, missing] ,[2, missing]])

colwise(ismissing, df)
colwise(ismissing, df[2,:])
julia> colwise(x -> any(ismissing.(x)), df)

3-element Array{Bool,1}:
 false
  true
  true

ismissing just asks if the column itself is a missing type, which it is not. Instead, you want to broadcast ismissing over the column data, and return true if any are missing.

EDIT: for your second question, to get the number of missing values in a column you can use

julia> colwise(x -> sum(ismissing.(x)), df)

3-element Array{Int64,1}:
 0
 1
 1
3 Likes

Ah that makes perfect sense. I need to get used to thinking about whether I want to broadcast a function or not. I love the general flexibility of the language that way, just need to practice more.

I appreciate the help!

describe(df) will also show you whether or not it contains missing values, as well as the number of missings. Check out ?describe for a better description.

3 Likes

colwise function is deprecated Future of colwise #1595

Best option is describe(df) or use eachcol, like this:

collect(any(ismissing.(c)) for c in eachcol(C))
or
collect(any(ismissing, c) for c in eachcol(C))

Using mapcols is also quite nice I think:

julia> mapcols(x -> any(ismissing, x), df)
1×3 DataFrame
 Row │ x1     x2    x3   
     │ Bool   Bool  Bool 
─────┼───────────────────
   1 │ false  true  true

or

julia> mapcols(Base.Fix1(any, ismissing), df)
1×3 DataFrame
 Row │ x1     x2    x3   
     │ Bool   Bool  Bool 
─────┼───────────────────
   1 │ false  true  true
3 Likes

Interesting mapcols returns a DataFrame like describe, and nice way to use any.