Query.jl: Selection using Base functions and possibly missing values

query

#1

Suppose I have a DataFrame with two fields: idx and date. The date field has missing values (in the DataFrames sense) and is currently stored in the DataFrame as a string. Is there a query statement that I can write which parses the string into a date? I tried something like this:

df2 = @from i in df begin
       @select {i.idx, date = Date.(i.date, "mm/dd/yyyy")}
       @collect DataFrame
       end

but got an error like this:

ERROR: type UnionAll has no field parameters
Stacktrace:
 [1] column_types at /Users/tcovert/.julia/v0.6/IterableTables/src/utilities.jl:20 [inlined]
 [2] _DataFrame(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},_} where _,Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##11#13}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:105
 [3] DataFrames.DataFrame(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},_} where _,Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##11#13}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:128
 [4] collect(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},_} where _,Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##11#13}, ::Type{DataFrames.DataFrame}) at /Users/tcovert/.julia/v0.6/Query/src/sinks/sink_type.jl:2

I also tried a version with no dot-broadcasting:

df2 = @from i in df begin
       @select {i.idx, date = Date(i.date, "mm/dd/yyyy")}
       @collect DataFrame
       end

and got this error:

ERROR: MethodError: Cannot `convert` an object of type DataValues.DataValue{String} to an object of type Int64
This may have arisen from a call to the constructor Int64(...),
since type constructors fall back to convert methods.
Stacktrace:
 [1] next at /Users/tcovert/.julia/v0.6/Query/src/enumerable/enumerable_select.jl:41 [inlined]
 [2] macro expansion at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:91 [inlined]
 [3] _filldf(::Tuple{DataArrays.DataArray{Int64,1},Array{Date,1}}, ::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},Date},Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##15#16}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:79
 [4] _DataFrame(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},Date},Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##15#16}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:119
 [5] DataFrames.DataFrame(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},Date},Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##15#16}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:128
 [6] collect(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},Date},Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##15#16}, ::Type{DataFrames.DataFrame}) at /Users/tcovert/.julia/v0.6/Query/src/sinks/sink_type.jl:2

is what I am trying to do possible? if so, what am I doing wrong?

thanks in advance for any suggestions you can offer.

here is some example data to apply the code to above: https://www.dropbox.com/s/kgiicawhegmtavc/query_example.csv?dl=0

(also posted here: https://github.com/davidanthoff/Query.jl/issues/134)


#2

Please do not double post on an issue tracker and discourse. It seems to me that @davidanthoff already pointed you towards a solution.


#3

I guess I am still confused about where to post questions about things that appear to be legitimate bugs. There is a clear message in the Query issue tracker that suggests questions about usage should go here, but it wasn’t obvious to me whether I would be more likely to get an answer here or there. In the future, where should questions like this be sent?


#4

I monitor both the github repo and the forum here, so I’ll see posts on either site. I would prefer to have usage questions here, and if it turns out that a usage question actually uncovered a bug in Query.jl, I’ll open an issue on github. If it is a clear bug, start with an issue. When in doubt, start here, we can always open an issue later.