I am still trying to get used to some of the syntax so I apologize in advance if this is a simple answer. I have a couple of columns in a DataFrame, some of which have a missing value and some columns do not. I am looking for a way to detect which columns have any missing value and then a way to see if any cells across the DataFrame have a single missing value.
Here is the code I have, I tried the two lines at the end and both come back all false even when there are missing values:
using Pkg
using DataFrames
df = DataFrame([5:6, [1, missing] ,[2, missing]])
colwise(ismissing, df)
colwise(ismissing, df[2,:])
julia> colwise(x -> any(ismissing.(x)), df)
3-element Array{Bool,1}:
false
true
true
ismissing
just asks if the column itself is a missing type, which it is not. Instead, you want to broadcast ismissing
over the column data, and return true
if any
are missing.
EDIT: for your second question, to get the number of missing values in a column you can use
julia> colwise(x -> sum(ismissing.(x)), df)
3-element Array{Int64,1}:
0
1
1
3 Likes
Ah that makes perfect sense. I need to get used to thinking about whether I want to broadcast a function or not. I love the general flexibility of the language that way, just need to practice more.
I appreciate the help!
describe(df)
will also show you whether or not it contains missing values, as well as the number of missings. Check out ?describe
for a better description.
3 Likes
colwise function is deprecated Future of colwise #1595
Best option is describe(df)
or use eachcol
, like this:
collect(any(ismissing.(c)) for c in eachcol(C))
or
collect(any(ismissing, c) for c in eachcol(C))
Using mapcols
is also quite nice I think:
julia> mapcols(x -> any(ismissing, x), df)
1×3 DataFrame
Row │ x1 x2 x3
│ Bool Bool Bool
─────┼───────────────────
1 │ false true true
or
julia> mapcols(Base.Fix1(any, ismissing), df)
1×3 DataFrame
Row │ x1 x2 x3
│ Bool Bool Bool
─────┼───────────────────
1 │ false true true
8 Likes
Interesting mapcols returns a DataFrame like describe, and nice way to use any.