Dear Community,
I am trying to dynamically select columns using Query.jl like in the following script, but I am unable to do so.
Any ideas?
Thank you
list_cols = [“col1”, “col2”…]
select = df |> @select(∈(list_cols)) |> DataFrame
Dear Community,
I am trying to dynamically select columns using Query.jl like in the following script, but I am unable to do so.
Any ideas?
Thank you
list_cols = [“col1”, “col2”…]
select = df |> @select(∈(list_cols)) |> DataFrame
Not an answer, but I give it for reference:
df
is already a data frame you can just use select(df, list_cols)
.df
is not a data frame then do select!(DataFrame(df), list_cols)
. (!
is used as we create a new data frame so it is safe to update it in place).Thank you @bkamins, but I wanted to use the Query.jl framework, if possible
I spent some time in this rabbit hole as well, and am closer to an answer, in the negative:
As of 2017, there was a technical constraint preventing the implementation of this. See the discussion here.
I am not aware of any updates but would love to see them if there are any.
The cleanest way I currently know of is using bkamins’ answer up front and using Query.jl after that
using DataFrames
using Statistics
using Query
using RDatasets
df = dataset("datasets", "mtcars");
targets = [:MPG, :Cyl, :HP]
# list selection up front
df[:, targets] |>
@filter(_.MPG > 15) |>
@groupby(_.Cyl) |>
@map({Cyl = key(_), AvgHP = mean(_.HP)})
As mentioned in the linked discussion, it might require doing things in two stages sometimes (ie, if the column list is a result of computation half-way down the pipe) but it’s totally workable.
Not to plug a package I maintain too much, but DataFramesMeta.jl is very similar to Query.jl but includes dynamic column selection (with no performance drop)
mpg = :MPG
cyl = :Cyl
hp = :HP
@chain df begin
@rsubset $mpg > 15
groupby(cyl)
@transform avghp = mean($hp)
end