Using a string variable in a Query.jl query

I am stuck trying to figure out how to use a string variable into a Query.jl query.

        x = "first"
        iname = string("i.",x,"_name")

        # Filter for unique names
        uniques = @from i in daily_data begin
                @group i by iname into g
                @select {first_name = g.key}
                @collect DataFrame
        end

I’ve tried escaping the string with $(iname) and $iname, but without success.

I need to perform the same query multiple times, but on different columns from the same DataFrame. The obvious step, for me anyway, is to create a function. x would ultimately be user input to the function.

I’m sure this is something straightforward, but I couldn’t find anything in the Query.jl docs or various search engine queries (probably not using correct terminology).

I am also fairly new to Julia but what about

x = "first"
iname = "i.$x\_name"

I am not sure it will be accepted by query.jl though. Maybe.

Thanks for the idea, but unfortunately didn’t work.

Try this:

x = :name_you_want_to_group_by

and then modify the group clause to

@group i by getfield(i, x) into g

Essentially x needs to be a Symbol and then things should work.

Thanks David. Appreciate the solution.

However, I’m struggling with it a little. On some toy data, it works fine:

df_one = DataFrame(first_name=["Sally", "Kirk", "John", "Ralph", "John", "John", "Sally"])

iname = :first_name

uniques = @from i in df_one begin
        @group i by getfield(i, iname) into g
        @select {first_name = g.key}
        @collect DataFrame
end

However, on my actual data, it fails with: type UnionAll has no field parameters. If I use i.first_name directly instead, it works.

I’m really scratching my head here. My actual data is a 387x36 DataFrames.DataFrame. I’ve checked and double checked the column name and it matches. I’ve written the data to CSV and manually checked for missing or strange values in that column, there are none. The column in question is of String type. I’ve tried on different columns, all fail with the same error.

I’ve tried with some larger dataset, which works fine:

using RDatasets
neuro = dataset("boot", "neuro")

name = :V5

uniques = @from i in neuro begin
        @group i by getfield(i, name) into g
        @select {v5 = g.key}
        @collect DataFrame
end

Using Julia 0.6.0, DataFrames 0.10.1, Query 0.8.0

1 Like

I can make this fail on mtcars

The following returns a type Union has no field parameters:

using RDatasets
using DataFrames
using Query

mtcars = dataset("datasets", "mtcars")

name = :Model

uniques = @from i in mtcars begin
        @group i by getfield(i, name) into g
        @select {Model = g.key}
        @collect DataFrame
end

This works:

uniques = @from i in mtcars begin
        @group i by i.Model into g
        @select {Model = g.key}
        @collect DataFrame
end
1 Like

It turns out there’s a difference in the inferred column types:

julia> name = :Model;
       uniques = @from i in mtcars begin
           @group i by getfield(i, name) into g
           @select {Model = g.key}
           @collect
       end;
       typeof(uniques)
Array{Union{NamedTuples._NT_Model{Float64}, NamedTuples._NT_Model{Int64}, NamedTuples._NT_Model{String}},1}

julia> uniques = @from i in mtcars begin
           @group i by getfield(i, :Model) into g
           @select {Model = g.key}
           @collect
       end;
       typeof(uniques)
Array{NamedTuples._NT_Model{String},1}

i.e. in the case where you don’t hardcode the symbol name, it’s thinking that the column type could also be Int64 or Float64, which are the types of the other columns in the table. Then it fails when it tries to convert to a DataFrame. The reason it works for neuro is that all columns are the same type so it knows the the selected column can only hold Float64s. (I also found that your neuro example fails on DataFrames 0.11 because it’s inferred as a union of a Float64 and a DataValue{Float64} thanks to the missing values.)

I don’t have a solution for you - @davidanthoff is this expected behaviour?

Coming back to this topic and thanks @swt30 for your investigations and input.

Hoping @davidanthoff might have some further thoughts. Or perhaps I should raise as an Issue on GitHub?

Thanks for not forgetting! I think the only solution here is to wait until I’ve gotten rid of the reliance on type inference in Query.jl… That is a big project, very much on my radar, but no promise when that will happen.