Lets say i have the following dataframe:
df=DataFrame(A=[1,2,missing], B=[10,missing,missing])
I need to count how many missingβs in each column and get the result like this:
A(1), B(2)
i tried:
collect(ismissing(x) for x in eachcol(df))
and it returns 0,0. Idk why.
Thanks
jzr
2
julia> describe(df, :nmissing)
2Γ2 DataFrame
Row β variable nmissing
β Symbol Int64
ββββββΌββββββββββββββββββββ
1 β A 1
2 β B 2
4 Likes
@izr Is there an easy way to append another column with the size of the df? for example, column A has 1 missing out of 3 values, like adding nrow
jzr
5
This counts nonmissing values. Counting all values is different.
julia> describe(df, :nmissing, length => :length)
2Γ3 DataFrame
Row β variable nmissing length
β Symbol Int64 Int64
ββββββΌββββββββββββββββββββββββββββ
1 β A 1 2
2 β B 2 1
Iβm not sure why
julia> describe(df, length)
isnβt allowed, it seems like it could be.
3 Likes
yha
6
Thatβs because the elements returned by eachcol(df)
are columns, which are arrays, and not missing
. Your line is equivalent to
[ismissing(df.A), ismissing(df.B)]
To count missings in each columns you can do
[count(ismissing,col) for col in eachcol(df)]
1 Like
Just a note: This is a gotcha in DataFrames, and behaves differently in DataFrames 1.0
julia> df = DataFrame(a = [1, 1, 2, 2], b = [1, 1, missing, 4])
4Γ2 DataFrame
Row β a b
β Int64 Int64?
ββββββΌββββββββββββββββ
1 β 1 1
2 β 1 1
3 β 2 missing
4 β 2 4
julia> describe(df, length => :length)
2Γ2 DataFrame
Row β variable length
β Symbol Unionβ¦
ββββββΌββββββββββββββββββ
1 β a 4
2 β b
This is because functions are called on skipmissing(col)
rather than col
, and length
is not defined for SkipMissing
objects, so we get nothing
.
6 Likes