Issues with multiple @let statements in Query.jl

question

#1

Whenever I try to use more than two @let statements in my query I get an error (Julia version 0.6.0, Query.jl 0.6.0).

DataFrame sink:

type TypeofBottom has no field parameters

Array sink:

TypeError: getfield: expected Symbol, got Expr

Code that generates error:

using DataFrames, Query

df = DataFrame(name=["John", "Sally", "Kirk"], age=[23., 42., 59.], children=[3,2,2])
x = @from i in df begin
    @let count = length(i.name)
    @let kids_per_year = i.children / i.age
    @let isjohn = i.name == "John" ? 1 : 0
    @where count > 4
    @select {i.name, Count=count, KPY=kids_per_year}
    @collect DataFrame
end

Commenting out the third @let avoids the issue…any ideas?

Thanks!


#2

This appears to be a bug in Query.jl. AFAICT, the find_names_to_put_in_scope expansion of transparentidentifier seems to be fairly hairy, but has never been tested with that many variables, and does not normalize them correctly to a.b expressions if it recurses. There’s an insufficient amount of comments and whitespace in that code base for me to dig further just now, though.

I recommend opening an issue on Query.jl with your repro bug report. I think it should be enough for David to go on to fix the issue.


#3

Thanks for looking into this, I just posted the issue: #133


#4

Thanks, I’ll try to take a look soon, but it might be a couple of days, this week is a bit hectic with other stuff.


#5

I get the same error with:

data = @from u in userdata begin
    @join s in screens on u.Screen_Id equals s.screen_id
    @group u by u.Date into a
    @select {a.screen_name, Count=length(a)}
    @collect DataFrame
end

“userdata” and “screens” are both CSV.sources with weakrefstrings set to false.


#6

I think the main issue here is that a.screen_name is not valid in this query. The @group clause will create a stream of Groupings, i.e. each a will be an instance of Grouping. Grouping won’t have a field a, though. It will have one field key that will hold the value of u.Date, and then it is like a vector, where each element is an instance u.

Could you maybe post the structure of the input tables? I.e. what columns they have? That would make it easier to understand what you are trying to do. I’ll be offline until Tue, though, so will take a while to respond.