DataFrames: How to remove rows containing NaNs when there are also missings

florian · May 18, 2021, 9:29pm

Assume I have the following DataFrame and want to remove rows containing NaN:

df = DataFrame(a=[NaN, 1.1, NaN, missing, missing], b=[1.1, 2, 3, missing, NaN], c='a':'e');

For just one column I could do something like:
filter(x->(ismissing(x.a) || !isnan(x.a)), df)

To extend this to all columns I tried to use the subset function in combination with the usual DataFrame transformation syntax, but couldn’t get it to work:
subset(df, :a => ByRow(x->(ismissing(x) || !isnan(x)))) (works)
subset(df, names(df, Union{Float64, Missing}) .=> ByRow(x->(ismissing(x) || !isnan(x)))) (doesn’t work)

bkamins · May 18, 2021, 9:42pm

The simplest is probably:

filter(row -> all(x -> !(x isa Number && isnan(x)), row), df)

You can also write:

subset(df, (names(df) .=> ByRow(x -> !(x isa Number && isnan(x))))...)

Note that names(df, Union{Float64, Missing}) is not fully correct, as your column could have e.g. Any type and still contain NaN.

liuyxpp · December 15, 2023, 2:28am

Sorry for revive this topic. But why there is no dropna function like pandas? We have dropmissing but no dropna…

bkamins · December 15, 2023, 10:15am

Because pandas in the past did not have a first class support for missing values, so it used NaN as a surrogate.

In DataFrames.jl by design missing values are properly supported, so we have dropmissing. In Julia NaN should not be used to indicate missingness.

xgdgsc · November 28, 2024, 10:07am

But if you do mmap array files on disk and merge them without copy to a DataFrame you cannot use a column type with union of missing. And imagine some new user come from python with deep habit of using NaN as missing. There could still be value adding the dropna function by default.

rocco_sprmnt21 · November 28, 2024, 11:10am

Is there a reason (or reasons) why isnan() is not defined for characters?

julia> any(isnan, df[4,:])
ERROR: MethodError: no method matching isnan(::Char)

Dan · November 28, 2024, 11:38am

By its definition, NaN is a value of floating point representations, so it is “non-sensical” to ask a non floating point if it is NaN. Similarly, there is no iszero for Char.

Seems like a legit choice to me.

rocco_sprmnt21 · November 28, 2024, 12:40pm

mmmh … will be.
But do you admit that it sounds strange at least from an “aesthetic” point of view that isa(NaN, Number) = true?

filter(row -> all(x -> !(x === NaN), row), df)

Dan · November 28, 2024, 1:12pm

is okay, since

isa(1im, Number) == true

and NaN is a special number. You can actually get it returned by math ops:

@fastmath sqrt(-2.0) == NaN

Of course, other choices could be made, but this is one of them.

rocco_sprmnt21 · November 28, 2024, 10:01pm

I just noticed that it reads like:

“Is a Not_a_Number a Number?”
“yes, Not_a_Number is a Number”

Topic		Replies	Views
Dealing with NaN's General Usage dataframes	21	5560	April 27, 2021
How to filter out rows with NaN in specific fields? New to Julia dataframes	2	3433	October 24, 2019
How to remove rows containing missing from DataFrame? New to Julia	6	13213	July 22, 2019
Iterate over all numeric columns in DataFrames Data	21	4855	February 11, 2018
Replacing missing and NaN values in dataframe New to Julia question , dataframes , missing-values	6	4104	March 29, 2022

DataFrames: How to remove rows containing NaNs when there are also missings

Related topics