Suppose I have a DataFrame with a great number of fields and the number of these columns I want to select using Query.jl. is also large. Is there any way to automatize this?
For example, I have a vector v naming the fields that are interesting to me:
v = [name, sex, age, ..., grade20]
What is the best way to select all those fields inside a query?
I tried some metaprogramming but it didn’t work out:
str = ""
for i in v
str = str * "i.$i, "
end
str = str[1:length(str)-2]
parse(str)
ERROR: ParseError("extra token \"0.1\" after end of expression")
I would like that something like the following worked:
newdf = @from i in df begin
@select {i.v}
@collect DataFrame
end
Is there any easy way that I am not thinking?
Thank you
There is no way to do this right now.
I have this scenario firmly on my radar, but for some pretty fundamental technical reasons I’ll only be able to implement a solution for this once we have named tuples in base (https://github.com/JuliaLang/julia/pull/22194). Which, unfortunately, means that the earliest we might have a solution for this scenario will be in the julia 0.7/1.0 timeframe…
I see =/
Well, thanks anyway for the reply! I found a way around in my particular situation without using Query:
df; # huge DataFrame
v = ["sex", "age", ...]; # smaller vector of selected relevant features
relevant_df = df[:, [parse(i) for i in v]]; # 'constrained' DataFrame
Then I use Query if needed. It’s not optimal because I have to do it in two steps, but in my case at least this is functional.
1 Like
Should be totally possible with LazyQuery
using LazyQuery
@new_environment; @use_in_environment LazyQuery
@chain @evaluate begin
columns = [:a, :b, :c]
DataFrame(a = 1, b = 2, c = 3, d = 4)
query(it)
make_from(it, columns...)
collect(it, DataFrame)
end
@bramtayl Just out of curiosity, how is a row represented in this? With a NamedTuple?
Yup. The query in the above is just an exported version of Query.query