DataFramesMeta questions

Could groupby have a @ macro version, to avoid parenthesis?
and the ability to define a column name within a groupby statement.
Also, for a multi-line macro in a @chain block, followed by another macro - would it be possible to drop the begin and end? and define the end as the next macro
So this

@chain DataFrame( A = 1:10 ) begin

    @rtransform :Mod3 = :A % 3
    groupby( :Mod3 )
    @combine  begin
            :Asum = :A |> sum
            :Amax = :A |> maximum
    end
    @rsubset :Mod3 < 2
end

Could by written like this

@chain DataFrame( A = 1:10 ) begin

    @groupby :Mod3 = :A % 3

    @combine  
        :Asum = :A |> sum
        :Amax = :A |> maximum

    @rsubset :Mod3 < 2
end

Also within an @rtransform block. is there any way to reference the previous row?
something like

@rtransform    :Adiff   =   :A  -  prev( :A )

The answer to number 2 is no. According to the parser, a new line is a new statement without the begin. It’s just the way macros work.

As for number 1, it’s complicated.

This is something I would like to do, but getting the right intuitive behavior is hard.

Should

@groupby :Mod3 = :A % 3

allocate a new data frame? If so, that would be different from groupby in DataFrames.jl, which is a light-weight operation which does not allocate. What should @groupby :g do? Would it also allocate a new data frame?

I don’t think it would be intuitive behavior for @groupby to allocate but not groupby.

Perhaps we should not allocate a fresh data frame, and have it be an alias for

@rtransform! :Mod3 = :A % 3
groupby(:Mod3)

but then should it be called @groupby!? Maybe. These are API questions that deserve more discussion.

Here is some discussion on Discourse that mentions this feature as well as some other ideas for what @groupby could do.

some other packages like DataFrameMacros.jl do. I think it will be good for DataFramesMeta.jl to have it too.

It’s worth nothing that @groupby makes a copy of the data frame whether a new column is created or not. See here. I would be open to adding this but it does seem like it would make things slow for new users.

1 Like

Without reading that note, I suspect you can make @groupby compile to a transform + groupby unless there is a name clash etc

I added DataFrameMacros. that @groupby seems to work fine alongside DataFramesMeta commands. This even works @groupby :Mod3 = :A % 3. Many thanks :slight_smile:

last bit of the question: within @rtransform, is it possible to reference the previous row. eg
@rtransform :Adiff = :A - prev( :A )

Also, I upgraded to yesterday’s release v0.10.0 but this still gives an error (Column :B not found)

@chain DataFrame(A=1) begin
       @rtransform begin
                   :B = :A
                   :C = :B
        end
end

I suspect with row-semantic u might need to create the column using lag first

no worries thanks

transform makes a copy, though, as well, so that doesn’t get around the problem fully.

Another name might help: @transformgroup or something less verbose.

You need @astable there.

@chain DataFrame(A=1) begin
       @rtransform @astable begin
                   :B = :A
                   :C = :B
        end
end

Making interdependent columns is not the default.

1 Like

No, to do this I would just use @transform, which is column-wise

using ShiftedArrays
@transform df :Adiff = :A .- lag(:A)
1 Like

Thanks so much for all this :slight_smile:

DataFramesMeta rocks

1 Like

Is it possible to use @pipe within @chain ?

Someone gave me code for nested pipes. So _ references the inner most pipe.

macro pipe2(x) esc(Pipe.funnel(macroexpand(__module__, x))) end

is there an equivalent to allow a pipe within a chain and have _ reference the previous pipe output

I don’t think so… if an outer macro and an inner macro both give _ a special meaning, it’s hard to distinguish different meanings.

Note that Chain.jl already let’s you use _ as the previous output.

foo(x, y) = x + y
z = 1
@chain z begin 
    foo(5, _)
end

yes, a chain can exist within a function within a chain. and _ references the inner chain.
though you can’t use a pipe within a chain in the same way

No, unfortunately it’s hard to make macros play nice together like that.

Chain inside chain is a special behavior I programmed in, it doesn’t follow from macro behavior. Macros don’t compose that way sadly.

Well its much appreciated. Thank you :slight_smile:

How about, instead of having the @chain macro, change base Julia so the previous line’s output is always available, either through “_” or by emitting the first argument of a function. Maybe two blank lines could be a break in the chain.

I can get similar behaviour by enclosing the whole program within a @chain macro.
Though it can create problems when the previous line isn’t used. A double blank line indicating a chain break would fix this I think.

You are aware of the @aside macro-flag, right? It does what you want

@chain df begin
    @rtransform :B = :A * 2
    @aside X = 1
    @rtransform :X = :B + X
end

See docs here.