How to filter out rows with NaN in specific fields?

This seems like a pretty basic operation, but I’m having difficulty figuring it out. In dplyr I’d do something like filtering based on !is.na(my_field), but I’m not sure what the equivalent Julia/DataFramesMeta is. I have @linq statements, and I’ve tried a variety of solutions (like filtering out missing) which haven’t worked. How can I filter out rows with NaN in specific fields within my linq statements?

Thank you!

You’re looking for isnan:

using DataFrames
x = randn(10)
x[5] = NaN
df = DataFrame(x=x)

filter(row -> ! isnan(row.x), df)

Or, using DataFramesMeta,

using DataFramesMeta
@where(df, .! isnan.(:x))
@linq df |> where(.! isnan.(:x))

In Julia, missing is the equivalent of R’s NA, and is used for any value which exist in theory but are not available or weren’t measured. In contrast NaN (not-a-number) only exists for Floats. For most data analysis, missing is more generic and will be easier to work with in Julia–depending on your workflow, it might make sense to convert NaNs to missings first.

(For more than you (probably) want to know on this topic, please this blog post and this Discourse discussion:upside_down_face:)

2 Likes

Here’s a way if you want to remove rows with NaN’s in any field:

filter!(isfinite ∘ sum, df)