I am trying to learn DataFramesMeta LINQ similar syntax.
I have not been able to come up the proper counter of following very simple DataFrames syntax for DataFramesMeta
Get number of rows.
combine(df, nrow)
I am trying to learn DataFramesMeta LINQ similar syntax.
I have not been able to come up the proper counter of following very simple DataFrames syntax for DataFramesMeta
Get number of rows.
combine(df, nrow)
You can drop the combine
. Use nrow(df)
to get the total number of rows in the the DataFrame. If you are after the number of rows belonging to each group, then you need to group the DataFrame.
using DataFrames
# Generate a DataFrame with lots of repeated values
df = DataFrame(:a => rand(1:10, 10000), :b => rand(1:10, 10000))
combine(groupby(df, :a), nrow)
Probably @pdeffebach is best to be asked here. I think he has not implemented a special syntax for this yet, so the simplest is @combine(df, :nrow = length(:some_column_in_the_data_frame))
(but I might be wrong and there is some special case for this).
There is no special case for this. There is no equivalent to Stata’s _N
or dplyr’s n()
.
You can do
@combine df $nrow
But that kind of thing can’t be used inside transformations, so you can’t do
@combine df :y = :x / $nrow
I would only use
@combine df $nrow
if you really know what you are doing and exactly how DataFramesMeta.jl translates expressions to actually DataFrames.jl calls.
So stick to length
for now.
I think @combine df $nrow
is fine. Is this style @combine df $fun
translated to combine(df, fun)
in DataFrames.jl?
Yes. It gets passed straight through.
Thank you for the response. nrow may be needed as a column in some output. I know that nrow can be extracted various ways. I am trying to translate the examples in DataFrames documentation to DataFramesMeta so nrow as a colum often comes up.
@pdeffebach - for my reference can you please comment why @combine(df, nrow)
does not work?
@combine(df, nrow)
does not work because I don’t want to allow arbitrary expressions that don’t get passed through the parsing framework. Think about @subset
and @orderby
.
@subset df my_fun(:x)
gets transformed into
subset(df, :x => (t -> myfun(t)))
And then consequently
@subset df myfun
should be passed
subset(df, [] => (t -> myfun))
You would be asking for something like
@subset df myfun
to get transformed into
subset(df, myfun)
which is at odds with the rest of the parsing.
Basically, there is one rule for parsing without $
and it would add a lot of complexity to make exceptions.
OK - great!