Could groupby have a @ macro version, to avoid parenthesis?
and the ability to define a column name within a groupby statement.
Also, for a multi-line macro in a @chain block, followed by another macro - would it be possible to drop the begin and end? and define the end as the next macro
So this
@chain DataFrame( A = 1:10 ) begin
@rtransform :Mod3 = :A % 3
groupby( :Mod3 )
@combine begin
:Asum = :A |> sum
:Amax = :A |> maximum
end
@rsubset :Mod3 < 2
end
Could by written like this
@chain DataFrame( A = 1:10 ) begin
@groupby :Mod3 = :A % 3
@combine
:Asum = :A |> sum
:Amax = :A |> maximum
@rsubset :Mod3 < 2
end
Also within an @rtransform block. is there any way to reference the previous row?
something like
The answer to number 2 is no. According to the parser, a new line is a new statement without the begin. It’s just the way macros work.
As for number 1, it’s complicated.
This is something I would like to do, but getting the right intuitive behavior is hard.
Should
@groupby :Mod3 = :A % 3
allocate a new data frame? If so, that would be different from groupby in DataFrames.jl, which is a light-weight operation which does not allocate. What should @groupby :g do? Would it also allocate a new data frame?
I don’t think it would be intuitive behavior for @groupby to allocate but not groupby.
Perhaps we should not allocate a fresh data frame, and have it be an alias for
@rtransform! :Mod3 = :A % 3
groupby(:Mod3)
but then should it be called @groupby!? Maybe. These are API questions that deserve more discussion.
Here is some discussion on Discourse that mentions this feature as well as some other ideas for what @groupby could do.
It’s worth nothing that @groupby makes a copy of the data frame whether a new column is created or not. See here. I would be open to adding this but it does seem like it would make things slow for new users.
yes, a chain can exist within a function within a chain. and _ references the inner chain.
though you can’t use a pipe within a chain in the same way
How about, instead of having the @chain macro, change base Julia so the previous line’s output is always available, either through “_” or by emitting the first argument of a function. Maybe two blank lines could be a break in the chain.
I can get similar behaviour by enclosing the whole program within a @chain macro.
Though it can create problems when the previous line isn’t used. A double blank line indicating a chain break would fix this I think.