I am stuck trying to figure out how to use a string variable into a Query.jl query.
x = "first"
iname = string("i.",x,"_name")
# Filter for unique names
uniques = @from i in daily_data begin
@group i by iname into g
@select {first_name = g.key}
@collect DataFrame
end
I’ve tried escaping the string with $(iname) and $iname, but without success.
I need to perform the same query multiple times, but on different columns from the same DataFrame. The obvious step, for me anyway, is to create a function. x would ultimately be user input to the function.
I’m sure this is something straightforward, but I couldn’t find anything in the Query.jl docs or various search engine queries (probably not using correct terminology).
However, I’m struggling with it a little. On some toy data, it works fine:
df_one = DataFrame(first_name=["Sally", "Kirk", "John", "Ralph", "John", "John", "Sally"])
iname = :first_name
uniques = @from i in df_one begin
@group i by getfield(i, iname) into g
@select {first_name = g.key}
@collect DataFrame
end
However, on my actual data, it fails with: type UnionAll has no field parameters. If I use i.first_name directly instead, it works.
I’m really scratching my head here. My actual data is a 387x36 DataFrames.DataFrame. I’ve checked and double checked the column name and it matches. I’ve written the data to CSV and manually checked for missing or strange values in that column, there are none. The column in question is of String type. I’ve tried on different columns, all fail with the same error.
I’ve tried with some larger dataset, which works fine:
using RDatasets
neuro = dataset("boot", "neuro")
name = :V5
uniques = @from i in neuro begin
@group i by getfield(i, name) into g
@select {v5 = g.key}
@collect DataFrame
end
The following returns a type Union has no field parameters:
using RDatasets
using DataFrames
using Query
mtcars = dataset("datasets", "mtcars")
name = :Model
uniques = @from i in mtcars begin
@group i by getfield(i, name) into g
@select {Model = g.key}
@collect DataFrame
end
This works:
uniques = @from i in mtcars begin
@group i by i.Model into g
@select {Model = g.key}
@collect DataFrame
end
It turns out there’s a difference in the inferred column types:
julia> name = :Model;
uniques = @from i in mtcars begin
@group i by getfield(i, name) into g
@select {Model = g.key}
@collect
end;
typeof(uniques)
Array{Union{NamedTuples._NT_Model{Float64}, NamedTuples._NT_Model{Int64}, NamedTuples._NT_Model{String}},1}
julia> uniques = @from i in mtcars begin
@group i by getfield(i, :Model) into g
@select {Model = g.key}
@collect
end;
typeof(uniques)
Array{NamedTuples._NT_Model{String},1}
i.e. in the case where you don’t hardcode the symbol name, it’s thinking that the column type could also be Int64 or Float64, which are the types of the other columns in the table. Then it fails when it tries to convert to a DataFrame. The reason it works for neuro is that all columns are the same type so it knows the the selected column can only hold Float64s. (I also found that your neuro example fails on DataFrames 0.11 because it’s inferred as a union of a Float64 and a DataValue{Float64} thanks to the missing values.)
I don’t have a solution for you - @davidanthoff is this expected behaviour?
Thanks for not forgetting! I think the only solution here is to wait until I’ve gotten rid of the reliance on type inference in Query.jl… That is a big project, very much on my radar, but no promise when that will happen.