I really like to use
missing, but there are some instances where it becomes annoying. First, logical indexing. Second, when a package hasn’t implemented a safety method
For example, suppose I read in a DataFrame from a file and that file contains missing values. Now I want to subset using
in on one of the columns containing missing values. The understandably conservative logic of
missing requires two steps rather than one. First, I must eliminate rows with
missing, then I can do the filtering operation.
using DataFrames df = DataFrame(a=[1,missing,3],b=["low",missing,"high"]) @where(df,:a .> 0) # error @where(df,in.(:b,Ref(("low","high")))) # error # necessary (?) @where(df,.!ismissing.(:a),:a.>0)
Is there a way to change the logic so that
missing > 0 == false or
missing \in (0,1) == false? I’m not proposing to change the default behavior, I’m just wondering if there is a way for me to basically make
missing behave more like
NaN but not just for numeric columns.
Additionally, and not to pick on any package specifically, but to give an example.
using Distributions pdf(Normal(0,1),missing) # errors
What am I to do in this example? Well, this again becomes a two step procedure rather than one. I need to define an anonymous function like
pdf2(x) = ismissing(x) ? missing : pdf(Normal(0,1),x) or something.