Query.jl: Automatize multiple @select at once

kaslusimoes · July 19, 2017, 3:16pm

Suppose I have a DataFrame with a great number of fields and the number of these columns I want to select using Query.jl. is also large. Is there any way to automatize this?

For example, I have a vector v naming the fields that are interesting to me:

v = [name, sex, age, ..., grade20]

What is the best way to select all those fields inside a query?

I tried some metaprogramming but it didn’t work out:

str = ""
for i in v
     str = str * "i.$i, "
end
str = str[1:length(str)-2]
parse(str)

ERROR: ParseError("extra token \"0.1\" after end of expression")

I would like that something like the following worked:

newdf = @from i in df begin
    @select {i.v}
    @collect DataFrame
end

Is there any easy way that I am not thinking?
Thank you

davidanthoff · July 19, 2017, 9:44pm

There is no way to do this right now.

I have this scenario firmly on my radar, but for some pretty fundamental technical reasons I’ll only be able to implement a solution for this once we have named tuples in base (https://github.com/JuliaLang/julia/pull/22194). Which, unfortunately, means that the earliest we might have a solution for this scenario will be in the julia 0.7/1.0 timeframe…

kaslusimoes · July 20, 2017, 3:07pm

I see =/
Well, thanks anyway for the reply! I found a way around in my particular situation without using Query:

df; # huge DataFrame
v = ["sex", "age", ...]; # smaller vector of selected relevant features
relevant_df = df[:, [parse(i) for i in v]]; # 'constrained' DataFrame

Then I use Query if needed. It’s not optimal because I have to do it in two steps, but in my case at least this is functional.

bramtayl · July 20, 2017, 5:19pm

Should be totally possible with LazyQuery

using LazyQuery
@new_environment; @use_in_environment LazyQuery

@chain @evaluate begin
    columns = [:a, :b, :c]
    DataFrame(a = 1, b = 2, c = 3, d = 4)
    query(it)
    make_from(it, columns...)
    collect(it, DataFrame)
end

davidanthoff · July 20, 2017, 5:35pm

@bramtayl Just out of curiosity, how is a row represented in this? With a NamedTuple?

bramtayl · July 20, 2017, 5:37pm

Yup. The query in the above is just an exported version of Query.query

Topic		Replies	Views
Dynamic selection of columns Query.jl Data query , dataframes , queryverse	4	780	October 16, 2023
@select all columns in Query.jl General Usage	4	592	April 2, 2021
Adding new column to DataFrame with Query.jl General Usage query , dataframes	2	1772	May 14, 2018
Issues querying a DataFrame General Usage query , dataframes	5	534	February 21, 2020
Field as Variable in Query.jl Data	5	963	March 1, 2018

Query.jl: Automatize multiple @select at once

Related topics