DataFramesMeta conditionals

tk3369 · March 29, 2018, 7:59pm

I often forget to put the dot when writing conditions. I wonder why we couldn’t just do it without and automate it inside DataFramesMeta with a @. macro. Any thoughts?

So, rather than writing this:

@where(df, :x .> 1)

I can do just this:

@where(df, :x > 1)

ExpandingMan · March 29, 2018, 8:02pm

I rather like the idea that the symbols are a simple drop in for the actual columns. It might save you a little bit of typing to auto-broadcast or whatever, but at the cost of making the overall semantics more confusing. Considering that people are free to put whatever function they want into the @where statements, including functions they define themselves, I’d think that complicating the semantics would be a bad idea.

tshort · March 29, 2018, 8:34pm

See also the following issue: https://github.com/JuliaStats/DataFramesMeta.jl/issues/39

piever · March 29, 2018, 8:44pm

JuliaDBMeta distinguishes row-wise macros (@map, @where, @transform) where your are iterating through rows and symbols correspond to a given field and column-wise macros (@with, @where_vec, @transform_vec) where symbols correspond to columns and you will often need to use dot broadcasting in combination with the latter.

Note that both versions are required, as for example one may want:

@where_vec(df, :a .> mean(:a))

which can’t be achieved row by row.

I wonder whether DataFramesMeta could implement a similar strategy. I’m not sure how easy it is to implement row-wise macros efficiently due to type stability issues with DataFrames, but maybe there are ways around that.

Yifan_Liu · March 29, 2018, 8:49pm

In R dplyr, filter(a > mean(a)) works. It is much less verbose.

pasha · March 29, 2018, 8:52pm

It seems to me that unless we have a @byrow! in play, we’re always in column space. I don’t see where the promotion of > to .> would cause confusion; I don’t know what someone would mean in this context with a non-broadcast >. It may be technically hard to achieve, but I support making it all @. if possible.

Addendum: This reminds me of the difference between mean and pmean, likewise max and pmax, min and pmin. That is a tricky place in R/dplyr, it’s basically a local max/min/mean calculation. So this nuance could be confused by the above syntax… maybe.

piever · March 29, 2018, 9:01pm

The only differences I see between filter(a > mean(a)) and @where_vec(:a .> mean(:a)) (JuliaDBMeta, just like dpyr, has a curried version) is the use of symbols to refer to columns and of dot broadcasting for element-wise comparison. I’m really not sure how one can avoid using symbols and just put variable names. Dot broadcasting is necessary because, if we are taking mean(a) it means a is a vector and thus we need to compare element-wise.

The example is in my view interesting because it shows a case where automatic dot broadcasting would not work with @where in DataFrames as one would get v .> mean.(v) which is not the correct thing.

Yifan_Liu · March 29, 2018, 9:49pm

a is a array, mean(a) is a scalar, when compare a and mean(a), I guess that the scalar is automatically converted to a array with the same length. I do not think there should be any confusion. I have used dplyr for my daily work for several years, and it works all well.

Pandas and data.table are a little bit more verbose than dplyr. It is a pain to do data wrangling in Matlab, and Julia seems to use Matlab style syntax for data manipulation. I think such style is great for writing numerical stuff, but for data manipulation, it might be easier to follow R and Python style.

pdeffebach · March 29, 2018, 11:05pm

What about this use case?

# keep only observations above the mean of income

@where(df, :income .> mean(:income)

As opposed to

m = mean(df[:income])
@where(df, :income > m)

Topic		Replies	Views
How to dynamically call a column in a @where macro? Data question	2	855	August 23, 2017
[ANN-RFC] DFMacros.jl Package Announcements dataframes	30	2029	June 19, 2021
DataFrames: obtaining the subset of rows by a set of values New to Julia dataframes	45	24042	April 27, 2024
Frustrated using DataFrames New to Julia dataframes , data_structures	97	10550	April 22, 2022
[ANN] DataFramesMeta 0.8.0 release Data	4	786	July 10, 2021

DataFramesMeta conditionals

Related topics