Query.jl: Automatize multiple @select at once

Suppose I have a DataFrame with a great number of fields and the number of these columns I want to select using Query.jl. is also large. Is there any way to automatize this?

For example, I have a vector v naming the fields that are interesting to me:

v = [name, sex, age, ..., grade20]

What is the best way to select all those fields inside a query?

I tried some metaprogramming but it didn’t work out:

str = ""
for i in v
     str = str * "i.$i, "
end
str = str[1:length(str)-2]
parse(str)

ERROR: ParseError("extra token \"0.1\" after end of expression")

I would like that something like the following worked:

newdf = @from i in df begin
    @select {i.v}
    @collect DataFrame
end

Is there any easy way that I am not thinking?
Thank you

There is no way to do this right now.

I have this scenario firmly on my radar, but for some pretty fundamental technical reasons I’ll only be able to implement a solution for this once we have named tuples in base (https://github.com/JuliaLang/julia/pull/22194). Which, unfortunately, means that the earliest we might have a solution for this scenario will be in the julia 0.7/1.0 timeframe…

I see =/
Well, thanks anyway for the reply! I found a way around in my particular situation without using Query:

df; # huge DataFrame
v = ["sex", "age", ...]; # smaller vector of selected relevant features
relevant_df = df[:, [parse(i) for i in v]]; # 'constrained' DataFrame

Then I use Query if needed. It’s not optimal because I have to do it in two steps, but in my case at least this is functional.

1 Like

Should be totally possible with LazyQuery

using LazyQuery
@new_environment; @use_in_environment LazyQuery

@chain @evaluate begin
    columns = [:a, :b, :c]
    DataFrame(a = 1, b = 2, c = 3, d = 4)
    query(it)
    make_from(it, columns...)
    collect(it, DataFrame)
end

@bramtayl Just out of curiosity, how is a row represented in this? With a NamedTuple?

Yup. The query in the above is just an exported version of Query.query