Hi all,
Is there no way to specifically filter for missing values in a DataFrame column that is compatible with filtering for non-missing values (e.g. so a single filter can loop through non-missing and missing?)
Take this code:
df = DataFrame(status=["employee", "employee", "contractor", missing], name=["Bill", "Bob", "John", "Joe"], salary =[1000,5000,3000,1200])
Filtering for employee:
filter(:status => n->n=="employee",df)
Must be written:
filter(:status => n->n=="employee",coalesce.(df,false))
This I guess is fine if I only want to filter for a non-missing value. But if I want to loop through the βstatusesβ column, whatβs my best approach? This code, as expected, does not work:
status_types = unique(df[:,:status])
for s in status_types
filter(:status => n->n==s,df)
println("I just filtered $s")
#do other stuff
end
And neither does this:
status_types = unique(df[:,:status])
for s in status_types
filter(:status => n->n==s,coalesce.(df,false))
println("I just filtered $s")
#do other stuff
end
Thereβs probably another bit here to understand why missing==missing does not equal true, but for now Iβm mostly concerned with how to actually filter in a way where I can get missing values but also retrieve any other value I want in the column.
Thanks for your help.
edit
Of course 2 minutes after I post I find the solution (after Kagi searching "Why doesnβt missing==missing = true in Julia).
I can get to my expected answer using three equal signs, e.g.:
for s in status_types
new_df = filter(:status => n->n===s,df)
println("I just filtered $s")
print(new_df)
#do other stuff
end
But, is this the optimal way? Is there any danger to using === here?
When looking at the Documentation on === itβs not really clear to me when I will get a true value or a false. Looking at a & b, visually, they look the same, and I would expect the return to be true.