Counting missing in a dataframe

Lets say i have the following dataframe:

df=DataFrame(A=[1,2,missing], B=[10,missing,missing])

I need to count how many missing’s in each column and get the result like this:
A(1), B(2)

i tried:

collect(ismissing(x) for x in eachcol(df))

and it returns 0,0. Idk why.

Thanks


julia> describe(df, :nmissing)
2Γ—2 DataFrame
 Row β”‚ variable  nmissing 
     β”‚ Symbol    Int64    
─────┼────────────────────
   1 β”‚ A                1
   2 β”‚ B                2
4 Likes

thanks.

@izr Is there an easy way to append another column with the size of the df? for example, column A has 1 missing out of 3 values, like adding nrow

This counts nonmissing values. Counting all values is different.

julia> describe(df, :nmissing, length => :length)
2Γ—3 DataFrame
 Row β”‚ variable  nmissing  length 
     β”‚ Symbol    Int64     Int64  
─────┼────────────────────────────
   1 β”‚ A                1       2
   2 β”‚ B                2       1

I’m not sure why

julia> describe(df, length)

isn’t allowed, it seems like it could be.

3 Likes

That’s because the elements returned by eachcol(df) are columns, which are arrays, and not missing. Your line is equivalent to

[ismissing(df.A), ismissing(df.B)]

To count missings in each columns you can do

[count(ismissing,col) for col in eachcol(df)]
1 Like

Just a note: This is a gotcha in DataFrames, and behaves differently in DataFrames 1.0

julia> df = DataFrame(a = [1, 1, 2, 2], b = [1, 1, missing, 4])
4Γ—2 DataFrame
 Row β”‚ a      b       
     β”‚ Int64  Int64?  
─────┼────────────────
   1 β”‚     1        1
   2 β”‚     1        1
   3 β”‚     2  missing 
   4 β”‚     2        4

julia> describe(df, length => :length)
2Γ—2 DataFrame
 Row β”‚ variable  length 
     β”‚ Symbol    Union… 
─────┼──────────────────
   1 β”‚ a         4
   2 β”‚ b                

This is because functions are called on skipmissing(col) rather than col, and length is not defined for SkipMissing objects, so we get nothing.

6 Likes