This seems like a pretty basic operation, but I’m having difficulty figuring it out. In dplyr I’d do something like filtering based on !is.na(my_field), but I’m not sure what the equivalent Julia/DataFramesMeta is. I have @linq statements, and I’ve tried a variety of solutions (like filtering out missing) which haven’t worked. How can I filter out rows with NaN in specific fields within my linq statements?
using DataFrames
x = randn(10)
x[5] = NaN
df = DataFrame(x=x)
filter(row -> ! isnan(row.x), df)
Or, using DataFramesMeta,
using DataFramesMeta
@where(df, .! isnan.(:x))
@linq df |> where(.! isnan.(:x))
In Julia, missing is the equivalent of R’s NA, and is used for any value which exist in theory but are not available or weren’t measured. In contrast NaN (not-a-number) only exists for Floats. For most data analysis, missing is more generic and will be easier to work with in Julia–depending on your workflow, it might make sense to convert NaNs to missings first.