Dynamic selection of columns Query.jl

Dear Community,

I am trying to dynamically select columns using Query.jl like in the following script, but I am unable to do so.
Any ideas?

Thank you

list_cols = [“col1”, “col2”…]

select = df |> @select(∈(list_cols)) |> DataFrame

Not an answer, but I give it for reference:

  • If df is already a data frame you can just use select(df, list_cols).
  • If df is not a data frame then do select!(DataFrame(df), list_cols). (! is used as we create a new data frame so it is safe to update it in place).

Thank you @bkamins, but I wanted to use the Query.jl framework, if possible

I spent some time in this rabbit hole as well, and am closer to an answer, in the negative:

As of 2017, there was a technical constraint preventing the implementation of this. See the discussion here.

I am not aware of any updates but would love to see them if there are any.

The cleanest way I currently know of is using bkamins’ answer up front and using Query.jl after that

using DataFrames
using Statistics
using Query
using RDatasets

df = dataset("datasets", "mtcars");

targets = [:MPG, :Cyl, :HP]

# list selection up front
df[:, targets] |>
  @filter(_.MPG > 15) |>
  @groupby(_.Cyl) |>
  @map({Cyl = key(_), AvgHP = mean(_.HP)})

As mentioned in the linked discussion, it might require doing things in two stages sometimes (ie, if the column list is a result of computation half-way down the pipe) but it’s totally workable.

Not to plug a package I maintain too much, but DataFramesMeta.jl is very similar to Query.jl but includes dynamic column selection (with no performance drop)

mpg = :MPG
cyl = :Cyl
hp = :HP
@chain df begin
    @rsubset $mpg > 15
    groupby(cyl)
    @transform avghp = mean($hp)
end
1 Like