With the upcoming DataFrames 1.0 release and the new select and transform functions, piping will become much idiomatic in basic DataFrames usage, i.e.
using DataFrames, Lazy
julia> @> df begin
transform(:b => (x -> x .+ 1) => :c)
select(:c)
end
One thing that I often want to do when doing this kind of piping in R is to stop in the middle of my chain and work in global scope for a bit and do something with a column I just created. But I don’t want to have to assign an intermediate dataframe. Lets say I have the following workflow
using DataFrames, Lazy
df = @> df begin
transform(:b => fun => :c)
end
v = do_something(df.c)
df = @> df begin
transform(:c => (x -> x .+ v) => :d)
end
ugh, what a pain! When with some macro magic we could do
df = @> df begin
transform(:b => fun => :c)
@stop _ # _ is the value returned from the pipe
v = do_something(_.c) # v now lives in whatever scope is outside this begin block
@continue # _ is the value returned from this loop.
transform(:c => (x -> x .+ v) => :d) # You can now access `v` here.
end
This may seem pretty excessive, but I can promise you it’s something I’ve wanted while using dplyr.
Just an idea I had, and thought I’d write it out to see if others have also wanted this.
I agree that that piping in dplyr is nice but but makes debugging or accessing intermediate results inconvenient.
Could a macro introduce a second level/type of ans? e.g.:
@> df transform(:b => fun => :c) # two exprs: feed first into second as first arg
@> select(:c) # one expr: feed last return of @> as first arg
@> df # one symbol: assign last return of @> to symbol
Grabing intermediate results might be as easy as having @>> return the last value without assigning it:
@> df transform(:b => fun => :c)
v = do_something(@>>) # v = do_something(_ANS); would (@>) be possible?
@> select(:c) # continue with _ANS
@> df
It was always okay for me to repeat %>% at the end of a each line when working with dplyr; the repetition wasn’t visually bothering me - something I can’t say about the begin-end-blocks.
I’ve no experience in writing macros so there might be some obvious problems with my proposal.
There are ways better than hardcoding _ANS to make your idea work, one of them is gensym(), which makes a non-conflicting name.
I think a solution would probably look somewhat similar to that. Perhaps an array of expressions that are separated where @stop and @continue calls are. But this is beyond my abilities and time at the moment.