[RFC] PipelessPipes.jl (now Chain.jl)

Definitely it should be @K, because it is a kestrel bird!

One thing I would also like in my dream chain function is a way to alter a global state without breaking the chain

@chain df begin 
    fun1(1)
    @eval_in_local_scope begin
    x = 5
    end
    fun2(x)
end

Iā€™m sure this makes some anti-chain purists have their head explode, but itā€™s something Iā€™ve always wanted in R.

1 Like

That already works :slight_smile: the whole expression is just one big let block

Can you give an example? is it just that expressions of the form Expr(:call, ....) are replaced and other ones are not?

Check the Readme, it lists the different ways expressions are replaced. You can also try it with macroexpand. Every expression just gets prepended with a newvar = , thatā€™s why error highlighting works etc. So you can do anything you can do in normal code

Ah okay, this is what people are discussing above about @! and @aside. ftw. I like @aside.

I love it. In my work the data transform stages often stack pretty deep, so the pipe symbol at the end (though cool generally) become burdensome.

  • Agree with eliminate _, good idea to make default with explicit _ allowed first.
  • Error handling ā€“ this is huge improvement
  • @! for debugging flag: not a good choice, semantically overlaps with the mutating function syntax (and means kinda the opposite).
    • other options (really anything but !):
    • @tee, @x (for exclude), @bypass, @(), @0, @<
  • The begin seems superflous, except that an end is needed, is it sensible to swallow that as well?

+1 for @bypass I like that too.

1 Like

The begin-end block is needed so that multiple statements are parsed as a single expression. If you have something like this

@chain df
    transform
end

it wonā€™t parse correctly. The parser will assume that the macro call ends after df, due to the subsequent newline character.

I agree in renaming @!, it is not easy to read, and it could confuse. For me @aside and @bypass are both nice and they are not confusing at all.

1 Like

+1 for @bypass

Iā€™m suggesting the @chain macro absorb the ā€˜beginā€™ keyword, the parsed result would still contain it.

There are plusses and minuses to this, basically users have to treat the @chain macro as the beginning of a block.

Parsing happens before macro expansion, so thatā€™s not actually possible. In other words, you canā€™t use arbitrary syntax in a macro. You can only use syntax that the Julia parser knows how to parse.

1 Like

Iā€™ve decided to use @aside, I think it both describes its purpose best and is easiest to understand without additional knowledge

5 Likes

As I like both Chain.jl and Underscores.jl, I did a POC to combine the two.
Borrow functions (and the name of @_) from Underscores.jl and we can define anonymous functions in the pipe block as expressions of _ or _1,_2,... (or _ā‚,_ā‚‚,...).

Examples

The macro @_ for POC is based on @chian and uses __ instead of _ as the placeholder.

@_ [1:5, 4:10] begin
  map(_[end]^2, __)
  filter(isodd, __)
end

using DataFrames
df = DataFrame(x = [1, 3, 2, 1], y = 1:4)
@_ df begin
    filter(_.x > 1 && isodd(_.y) , __)
    transform([:x, :y] => ByRow(_1 *100 + _2) => :z)
end

The original @chain would look like:

@chain [1:5, 4:10] begin
  map(x -> x[end]^2, _)
  filter(isodd, _)
end

@chain df begin
    filter(row -> row.x > 1 && isodd(row.y) , _)
    transform([:x, :y] => ByRow((a, b) -> a *100 + b) => :z)
end
2 Likes

I like the documentation example:

@chain df begin
  dropmissing
  filter(:id => >(6), _)
  groupby(:group)
  combine(:age => sum)
end

The only remaining underscore is in filter
This could be removed as well using

@where( :id .> 6 )

Unfortunately you then have to vectorise the condition.
Is there a way to avoid this and just have where( :id > 6 ) ?

is there a Chain equivalent of vectorised piping eg:

`@pipe [1 2 3] .|>
log .|>
_^2

No, thatā€™s a trade off that comes from rewriting to temporary variables and keeping the expressions on each line otherwise intact. Also because thereā€™s no symbol between lines that could signal this. That means broadcasting fusion does not happen across lines, but itā€™s not so common in the DataFrames scenario.

You can prefix function symbols like @. log though and of course use normal broadcasting like _ .^ 2, just remember thereā€™s no fusion across lines.

In the below, the insertcols! function gives an error if the first argument _ is removed.

@chain DataFrame(A=["a/b","c/d","x/y"]) begin
    insertcols!( _, ([:C1,:C2] .=>  split.( _.A, '/' ) |> invert )...)
end

does the first argument removal feature not work for ! functions ?

Many thanks for your answer by the way.
Iā€™m asking all these dump questions because I love the package.

This breaks because whenever you have a _ in the expression, the ā€œfirst argumentā€ rule ceases to apply. You always need the _ in the correct places whenever you have any _.