DataFramesMeta questions

Lincoln_Hannah · October 17, 2021, 11:58pm

Could groupby have a @ macro version, to avoid parenthesis?
and the ability to define a column name within a groupby statement.
Also, for a multi-line macro in a @chain block, followed by another macro - would it be possible to drop the begin and end? and define the end as the next macro
So this

@chain DataFrame( A = 1:10 ) begin

    @rtransform :Mod3 = :A % 3
    groupby( :Mod3 )
    @combine  begin
            :Asum = :A |> sum
            :Amax = :A |> maximum
    end
    @rsubset :Mod3 < 2
end

Could by written like this

@chain DataFrame( A = 1:10 ) begin

    @groupby :Mod3 = :A % 3

    @combine  
        :Asum = :A |> sum
        :Amax = :A |> maximum

    @rsubset :Mod3 < 2
end

Also within an @rtransform block. is there any way to reference the previous row?
something like

@rtransform    :Adiff   =   :A  -  prev( :A )

pdeffebach · October 18, 2021, 12:22am

The answer to number 2 is no. According to the parser, a new line is a new statement without the begin. It’s just the way macros work.

As for number 1, it’s complicated.

This is something I would like to do, but getting the right intuitive behavior is hard.

Should

@groupby :Mod3 = :A % 3

allocate a new data frame? If so, that would be different from groupby in DataFrames.jl, which is a light-weight operation which does not allocate. What should @groupby :g do? Would it also allocate a new data frame?

I don’t think it would be intuitive behavior for @groupby to allocate but not groupby.

Perhaps we should not allocate a fresh data frame, and have it be an alias for

@rtransform! :Mod3 = :A % 3
groupby(:Mod3)

but then should it be called @groupby!? Maybe. These are API questions that deserve more discussion.

Here is some discussion on Discourse that mentions this feature as well as some other ideas for what @groupby could do.

xiaodai · October 18, 2021, 12:26am

some other packages like DataFrameMacros.jl do. I think it will be good for DataFramesMeta.jl to have it too.

pdeffebach · October 18, 2021, 12:28am

It’s worth nothing that @groupby makes a copy of the data frame whether a new column is created or not. See here. I would be open to adding this but it does seem like it would make things slow for new users.

xiaodai · October 18, 2021, 12:34am

Without reading that note, I suspect you can make @groupby compile to a transform + groupby unless there is a name clash etc

Lincoln_Hannah · October 18, 2021, 1:03am

I added DataFrameMacros. that @groupby seems to work fine alongside DataFramesMeta commands. This even works @groupby :Mod3 = :A % 3. Many thanks

last bit of the question: within @rtransform, is it possible to reference the previous row. eg
@rtransform :Adiff = :A - prev( :A )

Also, I upgraded to yesterday’s release v0.10.0 but this still gives an error (Column :B not found)

@chain DataFrame(A=1) begin
       @rtransform begin
                   :B = :A
                   :C = :B
        end
end

xiaodai · October 18, 2021, 1:11am

I suspect with row-semantic u might need to create the column using lag first

Lincoln_Hannah · October 18, 2021, 1:14am

no worries thanks

pdeffebach · October 18, 2021, 1:39am

transform makes a copy, though, as well, so that doesn’t get around the problem fully.

Another name might help: @transformgroup or something less verbose.

pdeffebach · October 18, 2021, 1:40am

You need @astable there.

@chain DataFrame(A=1) begin
       @rtransform @astable begin
                   :B = :A
                   :C = :B
        end
end

Making interdependent columns is not the default.

pdeffebach · October 18, 2021, 1:41am

No, to do this I would just use @transform, which is column-wise

using ShiftedArrays
@transform df :Adiff = :A .- lag(:A)

Lincoln_Hannah · October 18, 2021, 2:03am

Thanks so much for all this

DataFramesMeta rocks

Lincoln_Hannah · October 18, 2021, 11:56pm

Is it possible to use @pipe within @chain ?

Someone gave me code for nested pipes. So _ references the inner most pipe.

macro pipe2(x) esc(Pipe.funnel(macroexpand(__module__, x))) end

is there an equivalent to allow a pipe within a chain and have _ reference the previous pipe output

pdeffebach · October 19, 2021, 1:52am

I don’t think so… if an outer macro and an inner macro both give _ a special meaning, it’s hard to distinguish different meanings.

Note that Chain.jl already let’s you use _ as the previous output.

foo(x, y) = x + y
z = 1
@chain z begin 
    foo(5, _)
end

Lincoln_Hannah · October 19, 2021, 2:02am

yes, a chain can exist within a function within a chain. and _ references the inner chain.
though you can’t use a pipe within a chain in the same way

pdeffebach · October 19, 2021, 3:05am

No, unfortunately it’s hard to make macros play nice together like that.

jules · October 19, 2021, 5:37am

Chain inside chain is a special behavior I programmed in, it doesn’t follow from macro behavior. Macros don’t compose that way sadly.

Lincoln_Hannah · October 19, 2021, 5:45am

Well its much appreciated. Thank you

Lincoln_Hannah · October 25, 2021, 10:52am

How about, instead of having the @chain macro, change base Julia so the previous line’s output is always available, either through “_” or by emitting the first argument of a function. Maybe two blank lines could be a break in the chain.

I can get similar behaviour by enclosing the whole program within a @chain macro.
Though it can create problems when the previous line isn’t used. A double blank line indicating a chain break would fix this I think.

pdeffebach · October 25, 2021, 10:55am

You are aware of the @aside macro-flag, right? It does what you want

@chain df begin
    @rtransform :B = :A * 2
    @aside X = 1
    @rtransform :X = :B + X
end

See docs here.

Topic		Replies	Views
DataFramesMeta release thread Package Announcements	0	606	November 28, 2020
[ANN-RFC] DFMacros.jl Package Announcements dataframes	30	2010	June 19, 2021
A quick proof-of-concept for a macro-less API for DataFrames that's easier to type New to Julia	1	452	August 29, 2020
How to use @byrow inside @chain pipeline? Data dataframes , dataframesmeta	2	150	June 4, 2024
Apply function By Row without re-stating column names General Usage dataframes , functions	36	3483	May 9, 2022

DataFramesMeta questions

Related topics