Replacement for get() in Query


#1

@davidanthoff The previous version of Query had get which was used to unwrap the value within the DataValue. Not sure what is the new workflow, but this query no longer works as occursin needs the actual string value.

@from b in df begin
        @where lowercase(b.City) == "san francisco" && b.State == "CA" && 
        occursin(r"\((.*), (.*)\)", b.Business_Location) 
    @select { b.DBA_Name, b.Source_Zipcode, b.NAICS_Code, b.NAICS_Code_Description, b.Business_Location }
    @collect DataFrame
end

Using get(b.Business_Location) throws:

ERROR: DataValues.DataValueException()
Stacktrace:
 [1] get at /Users/adrian/.julia/packages/DataValues/SNSrX/src/scalar/core.jl:75 [inlined]

Thanks!


#2

Also, without get, things like ismissing(b.Street_Address) also fail as it looks like the ismissing method is applied to the DataValue.


#3

All right, turns out that I needed to explicitly add and use DataValues?


#4

I came up with a working version - so now I think the problem was that I was trying to get a missing value. So missing values should be filtered out using DataValues.isna

using DataValues
@from b in df begin
        @where lowercase(b.City) == "san francisco" && b.State == "CA" && 
            ! isna(b.Business_Location) && 
            occursin(r"\((.*), (.*)\)", get(b.Business_Location)) 
    @select { b.DBA_Name, b.Source_Zipcode, b.NAICS_Code, b.NAICS_Code_Description, b.Business_Location }
    @collect DataFrame
end

#5

I believe get works still in the same way it always did? If gets the value if there is one, otherwise it throws an error. You can alternatively also use the [] syntax to get the value: b.Business_Location[] should also work.

And yes, isna is the way to test for missing values.

We should probably just add a lifted version of occursin to DataValues.jl so that one can just use it without all of this extra effort. I put it onto the todo.


#6

IMHO it is better to return missing and not throw exception if value is missing.

And occursin also have to return missing (occursin(a, b::Missing) = missing) if we want to have better implemented this: “Julia provides support for representing missing values in the statistical sense” (source: missing.md)


#7

Queryverse.jl doesn’t use the Missing construction from base for missing values support. There are a whole bunch of requirements that Query.jl and friends have for a missing value story that are not met with the current Missing implementation in base, so we are using DataValues.jl instead.