DataFrames: obtaining the subset of rows by a set of values

iris |> 
Filter(Species -> Species == "versicolor") |>
Map((SepalLength, SepalWidth) -> SepalLength / SepalWidth)

This scares me a bit because it means that fast filtering would have to happen with variables themselves, named in the code, instead of with functions, due to the non-standard evaluation. It’s an R-ism that I want to avoid if possible.

I’ve opened a WIP PR in DataFrames about using the select approach for combining results of a grouping operations: https://github.com/JuliaData/DataFrames.jl/pull/1601

Could you please clarify how to type in the Greek symbol on the REPL?

I tried \epsilon but this is not the correct symbol.

I mean the symbol that is written here: filter(row → row.col ∈ [1,2,3], df)

Thanks!

\in

When in doubt ask the REPL:

help?> ζ
"ζ" can be typed by \zeta<tab>

help?> ∈
"∈" can be typed by \in<tab>
3 Likes

That ∈([1,2,3]).(df.col) is some niiiiice syntax!

Specifically, nice for getting row number or index to create a @view of a dataframe. Views cant be made from the dataframe output of a filter or subset operation. This had me stumped for a while.

For example, I need to change the cells in column :X2 if the rows in column label match a string:

df_vw = @view df[∈(["label_p", "label_t", "label_w"]).(df.label), [:X1, :X2]]

# X1 is fraction, make into bips and update cell in :X2
transform!(df_vw, :X1 => ByRow(x -> string(round(Int, x * 100000), " bps")) => :X2)

# alternately with direct assignment:
df_vw[:, :X2 = string.(round.(Int, df_vw[:, :X1] * 10000), " bps")

Thanks again, @ExpandingMan !!!