Query.jl - filtering on missing data


#1

What is the correct way to filter on missing values? The following does not work properly:

result = @from r in df begin
    @where !ismissing(r.A)
    @select r
    @collect DataFrame
end

When I run such a query, the @where clause does not seem to recognize any values as missing. I encounter the same issue when using !== or isequal() in place of ismissing(). I did some searching and learned that there are compatibility issues between Query.jl and the Missing type, and that prior to Julia v0.7 isnull() produced the expected behavior however isnull() is no longer part of Base.


#2

I think you need to use isna now for DataValues; which Query.jl uses underneath for missing data.


#3

Yes, @quinnj is right, isna is the way to check for missing values in Queryverse/DataValues land on julia 0.7+.


#4

That resolved it for me. Thanks @quinnj @davidanthoff.

In case anyone wonders, isna lives in DataValues.jl


#5

Why isn’t it called ismissing as well for consistency? What can na be other then missing?


#6

The semantics of DataValue and Missing are quite different, especially when it comes to predicates like ==. I felt it was safer in that case to not reuse the Missing function names, but instead make it very clear that these are two different missing stories that behave differently.


#7

Makes sense, thanks.


#8

Please clarify what those two different missing stories are or point to a resource that explains it. Thanks.