Yet another piping idea

With the upcoming DataFrames 1.0 release and the new select and transform functions, piping will become much idiomatic in basic DataFrames usage, i.e.

using DataFrames, Lazy
julia> @> df begin
       transform(:b => (x -> x .+ 1) => :c) 

One thing that I often want to do when doing this kind of piping in R is to stop in the middle of my chain and work in global scope for a bit and do something with a column I just created. But I don’t want to have to assign an intermediate dataframe. Lets say I have the following workflow

using DataFrames, Lazy
df = @> df begin
   transform(:b => fun => :c) 
v = do_something(df.c)
df = @> df begin
    transform(:c => (x -> x .+ v) => :d)

ugh, what a pain! When with some macro magic we could do

df = @> df begin
   transform(:b => fun => :c) 
   @stop _ # _ is the value returned from the pipe
    v = do_something(_.c) # v now lives in whatever scope is outside this begin block
   @continue # _ is the value returned from this loop. 
    transform(:c => (x -> x .+ v) => :d) # You can now access `v` here. 

This may seem pretty excessive, but I can promise you it’s something I’ve wanted while using dplyr.

Just an idea I had, and thought I’d write it out to see if others have also wanted this.

I agree that that piping in dplyr is nice but but makes debugging or accessing intermediate results inconvenient.
Could a macro introduce a second level/type of ans? e.g.:

@> df transform(:b => fun => :c) # two exprs: feed first into second as first arg
@> select(:c)                    # one expr: feed last return of @> as first arg
@> df                            # one symbol: assign last return of @> to symbol

would be rewritten into:

_ANS = transform(df,:b => fun => :c)
_ANS = select(_ANS,:c)
df = _ANS

Grabing intermediate results might be as easy as having @>> return the last value without assigning it:

@> df transform(:b => fun => :c)
v = do_something(@>>) # v = do_something(_ANS); would (@>) be possible?
@> select(:c)         # continue with _ANS
@> df
  • It was always okay for me to repeat %>% at the end of a each line when working with dplyr; the repetition wasn’t visually bothering me - something I can’t say about the begin-end-blocks.
  • I’ve no experience in writing macros so there might be some obvious problems with my proposal.

There are ways better than hardcoding _ANS to make your idea work, one of them is gensym(), which makes a non-conflicting name.

I think a solution would probably look somewhat similar to that. Perhaps an array of expressions that are separated where @stop and @continue calls are. But this is beyond my abilities and time at the moment.

You actually can do this in R with the T-pipe from magrittr:

> df <- data.frame(a = 1:2)
> df %>%
+     transform(b = 2 * a) %T>%
+     {v <<- 2 * .$b} %>%
+     transform(c = 3 * b)
  a b  c
1 1 2  6
2 2 4 12
> v
[1] 4 8