DataFrames: How to remove rows containing NaNs when there are also missings

Assume I have the following DataFrame and want to remove rows containing NaN:

df = DataFrame(a=[NaN, 1.1, NaN, missing, missing], b=[1.1, 2, 3, missing, NaN], c='a':'e');

For just one column I could do something like:
filter(x->(ismissing(x.a) || !isnan(x.a)), df)

To extend this to all columns I tried to use the subset function in combination with the usual DataFrame transformation syntax, but couldn’t get it to work:
subset(df, :a => ByRow(x->(ismissing(x) || !isnan(x)))) (works)
subset(df, names(df, Union{Float64, Missing}) .=> ByRow(x->(ismissing(x) || !isnan(x)))) (doesn’t work)

1 Like

The simplest is probably:

filter(row -> all(x -> !(x isa Number && isnan(x)), row), df)

You can also write:

subset(df, (names(df) .=> ByRow(x -> !(x isa Number && isnan(x))))...)

Note that names(df, Union{Float64, Missing}) is not fully correct, as your column could have e.g. Any type and still contain NaN.

3 Likes

Sorry for revive this topic. But why there is no dropna function like pandas? We have dropmissing but no dropna

Because pandas in the past did not have a first class support for missing values, so it used NaN as a surrogate.

In DataFrames.jl by design missing values are properly supported, so we have dropmissing. In Julia NaN should not be used to indicate missingness.

1 Like