Piping in Julia

#1

Coming from a world where I use dplyr a lot in R for my SQL type operation needs, I find myself very confused on how to use pipes in Julia. In R I would try very hard to have the data be the first argument in a function so I could pipe the return value straight into the next function and chain them together.

I have explored the |> operator in Julia and I often find it hard to use since so many functions seem to have data as the last/second argument. Even when trying to use DataFramesMeta functions which seem to work better with each other and make thing chainable, I still run into some odd issues.

Maybe the answer just is while in R/hadley’s world this was the way things were built and it is just not very Julia like, which would be okay, but then how do people go about doing a bunch of transformations without wrapping a ton of functions in more and more parentheses and/or reassigning each function to a variable and then applying the next function, rinse and repeat, it just seems repetitive.

I know this is a pretty open ended post but if anyone has some good workflow tips and/or pieces of code they have written that show that workflow I would be much appreciated, I want to start my Julia journey doing things the Julia way and not just trying to do what I did in R in Julia if that makes sense.

1 Like
#2

Defining anonymous functions in-line can make this a bit easier. For example:

julia> f(x, y) = x + sum(y)
f (generic function with 1 method)

julia> [1,2,3] |> (y -> f(3, y))
9

One very nice feature of Julia is the fact that anonymous functions are just as fast as any other kind of function, so there’s no performance issue from doing this.

There are also the built-in functions Fix1 and Fix2 which take a function and an argument to fix as its first or second argument respectively. So the above example could also be:

julia> [1,2,3] |> Base.Fix1(f, 3)
9

Even better, there is a new syntax being developed: https://github.com/JuliaLang/julia/pull/24990 which should allow you to do something like:

julia> [1,2,3] |> f(3, _)

which I would say is pretty much perfect for this kind of application.

Finally, there’s a tried-and-true technique in Julia, which is: when the language doesn’t have a feature you need, just use a macro to create that feature. Lazy.jl has some useful macros for this kind of situation: https://github.com/MikeInnes/Lazy.jl#macros

11 Likes
#3

Thank you for pointing out that PR it’s good to know that someday could make it into base, _ would make life easier, I really just need that ability to pipe when the primary data isn’t in the first arguement spot. In the meantime I will try to use the annonomous function method you posted, it’s extra code for sure but hopefully don’t need it all the time if I use functions meant for piping in DataFramesMeta or Query.

#4

The @as macro in Lazy.jl will let you do exactly that. It gives you both chaining and the ability to specify the location of your argument.

julia> function ff(x)
       x + 1
       end
ff (generic function with 1 method)

julia> function gg(x)
       x + 5
       end
gg (generic function with 1 method)

julia> @as x 1 begin
       ff(x)
       gg(x)
       end

It can be used extremely well with DataFramesMeta. The combination of @> and @as is really great with DataFramesMeta.

4 Likes
#5

The Lazy.jl package does look pretty amazing, it fills in a lot of the functional gaps that I am used to using. I will explore it thoroughly!

#6

Has a decision been made on introducing a substitution symbol when piping to functions with multiple arguments? Example:

julia> a = [[[1,2],[3,4]],[[5,6],[7,8]]]
2-element Array{Array{Array{Int64,1},1},1}:
 [[1, 2], [3, 4]]
 [[5, 6], [7, 8]]
julia> hcat(hcat(a...)...)
2×4 Array{Int64,2}:
 1  3  5  7
 2  4  6  8

can be piped as

julia> a |> x->hcat(x...) |> x->hcat(x...)
2×4 Array{Int64,2}:
 1  3  5  7
 2  4  6  8

or, a faster alternative:

julia> a |> x->reduce(hcat,x) |> x->reduce(hcat,x)
2×4 Array{Int64,2}:
 1  3  5  7
 2  4  6  8

However, it would look cleaner if one could write:

julia> a |> hcat(_...) |> hcat(_...)

or

julia> a |> reduce(hcat,_) |> reduce(hcat,_)

It seems from the discussion that there is some hesitance in using underscore as substitution symbol. What about other symbols? The dollar sign ($) is used for interpolation in text strings = substituting in variable/expression values, and gives a good mnemonic for “substitution”. Could the dollar sign have a second role aside from text strings? E.g.,

julia> a |> hcat($...) |> hcat($...)

or

julia> a |> reduce(hcat,$) |> reduce(hcat,$)

?
Or maybe this looks ugly?
OK – perhaps people don’t use piping so much…

Anyway, I don’t really propose any symbol, I’m just curious about the state of this discussion.

#7

Here is a GitHub issue on defining anonymous functions with underscores:

You can subscribe too get notified of any developments.

It sounds like there are a few issues, from implementation details (parsing and AST; will this potentially break macros?), to questions about nesting behavior.

1 Like
#8

Thanks, Elrod. I’m just an amateur, so I understand that there may be problems with underscore. I’m not that concerned with the particular symbol, more with the possibility. Thus, could symbol $ or any other symbol be used instead of underscore?

#9

The symbol per se is not the issue, its about the semantics.

#10

OK – thanks for clarification.

#11

To expand a bit more, the questions are about things like what, exactly, would the underscore do. For example, f(_) is obviously x -> f(x), but what is g(f(_)) ? Is it x -> g(f(x)) or g(x -> f(x))? There are a lot of interesting edge cases to get right, and it’s important to figure out as many of them as possible beforehand, since many of these decisions can’t be changed later without breaking people’s code.

6 Likes