DataFramesMeta question

I am trying to learn DataFramesMeta LINQ similar syntax.

I have not been able to come up the proper counter of following very simple DataFrames syntax for DataFramesMeta

Get number of rows.

combine(df, nrow)

You can drop the combine. Use nrow(df) to get the total number of rows in the the DataFrame. If you are after the number of rows belonging to each group, then you need to group the DataFrame.

using DataFrames
# Generate a DataFrame with lots of repeated values
df = DataFrame(:a => rand(1:10, 10000), :b => rand(1:10, 10000)) 
combine(groupby(df, :a), nrow)

Probably @pdeffebach is best to be asked here. I think he has not implemented a special syntax for this yet, so the simplest is @combine(df, :nrow = length(:some_column_in_the_data_frame)) (but I might be wrong and there is some special case for this).

1 Like

There is no special case for this. There is no equivalent to Stata’s _N or dplyr’s n().

You can do

@combine df $nrow

But that kind of thing can’t be used inside transformations, so you can’t do

@combine df :y = :x / $nrow

I would only use

@combine df $nrow

if you really know what you are doing and exactly how DataFramesMeta.jl translates expressions to actually DataFrames.jl calls.

So stick to length for now.

1 Like

I think @combine df $nrow is fine. Is this style @combine df $fun translated to combine(df, fun) in DataFrames.jl?

1 Like

Yes. It gets passed straight through.

Thank you for the response. nrow may be needed as a column in some output. I know that nrow can be extracted various ways. I am trying to translate the examples in DataFrames documentation to DataFramesMeta so nrow as a colum often comes up.

@pdeffebach - for my reference can you please comment why @combine(df, nrow) does not work?

@combine(df, nrow) does not work because I don’t want to allow arbitrary expressions that don’t get passed through the parsing framework. Think about @subset and @orderby.

@subset df my_fun(:x)

gets transformed into

subset(df, :x => (t -> myfun(t)))

And then consequently

@subset df myfun

should be passed

subset(df, [] => (t -> myfun))

You would be asking for something like

@subset df myfun

to get transformed into

subset(df, myfun)

which is at odds with the rest of the parsing.

Basically, there is one rule for parsing without $ and it would add a lot of complexity to make exceptions.

3 Likes

OK - great!