I can understand that feeling, for me it was kind of the other way around. I knew how I wanted to write my dataframe transformation chains, but no package offered that syntax. So I just made my own in Chain.jl even if there were so many other kind of similar packages out there. So I think if you don’t have a strong feeling towards a certain style, or you just don’t encounter the problems other people tried to solve with their macros, it really doesn’t matter what you use.
Of course, from a high-level perspective, it does matter for the perception of the ecosystem what the defaults and go-to solutions are. I was pleasantly suprised that others seemed to like Chain.jl and wanted to use it, but I wonder what outsiders think when a lot of dataframes examples on the web use @chain because it’s so non-standard.
@_ people |> filter(_.age > 40, __) |> map(_.name, __)
What do the two consecutive _ mean?
It’s not intuitive, you have to read the documentation.
In Chain.jl you have to write begin…end every time.
I don’t like the fact that I have to learn a library and expect those who read my code to learn it in turn.
In my opinion a solution in the language would be better.
Sorry for my english (automatic translator)
It’s exactly the high-level perspective I’m thinking of. It’s possible to use any of these, maybe even pick the best, if you spend some time on it.
Many people, like me in regards these packages, don’t want to spend extra time for something that is peripheral to what they actually want to accomplish. For the package author the package is at the center, but for users it’s not.
If there’s “the most common way” to do things, it’s much easier to find examples on-line that you can use to help you. And it’s much easier to switch from another language like Python.
I don’t know, my R friends seem not to mind %>% that much, even though I think that’s a pretty bad default. It’s effectively a standard now, though. And it will be harder to achieve that here
Which are like two of the most relevant issues syntax can have? Visually noisy and annoying to type. It’s just that people have gotten used to it and R doesn’t provide other infix operator overloads so the discussion there is quite moot. Macros open up Pandora’s box in a way because there are so many possibilities to explore. I personally like that better than having forced conformity, but of course it can be an issue when coming up with defaults for widely used packages that cater to everyone’s needs
I use this syntax a lot - especially when developing pipeline to modify a DataFrame in some Jupyter notebook, e.g.:
data |> filter1 |> modifier |> filter2
sometimes with, sometimes without easing notation with DataFramesMeta or Chain functionality.
But what I somewhat miss in all this is an assignment operator acting to the right, e.g. |>= (or something similar), which would allow me to write:
data |> filter1 |> modifier |> filter2 |>= modified_data
instead of modified_data = data |> filter1 |> modifier |> filter2
which using |> notation looks less intuitive. The same holds also true for functions like map and filter when using do syntax - in such constructs data flows from the top to the bottom or left to the right and should be collected there, e.g.
map(vector) do element
...
end |>= modified_vector
You probably can, I’m not sure. However, R has that, and I came to the conclusion that I don’t want it. My rationale is that it’s much harder to read code when the assigned-to variable can be in obscure places like the end of a pipe. It’s much easier when they’re on the left.
Also , If the the |> pipe syntax expresses (or hints to the compiler ?) independent functional steps in the workflow, I think keeping the |> pipe syntax in Julia may be important for Automatic Vectorization for parallel computing (HPC via SIMD or SPMD hardware). IOW in as much as h |> g |> f indicates you can asynchronously assign 3 independent parallel threads to each of the functions the |> pipe syntax is superior to f(g(h(x))) which, I believe may implicitly (and correctly) require h to return results effectively Block-and-wait before starting g Block-and-wait before starting f. ( possibly making a program a Super slow junior woodchuck mistake )
Automatic Vectorization ** for parallel computing (HPC via SIMD or SPMD hardware), is a special case of automatic parallelization, where a computer program is converted from a scalar implementation, which processes a single pair of operands at a time, to a vector implementation, which processes one operation on multiple pairs of operands at once.
For example, modern conventional computers, including specialized supercomputers, typically have vector operations that simultaneously perform operations such as the following four additions
(HPC via SIMD or SPMD hardware):
c 1 = a 1 + b 1
c 2 = a 2 + b 2
c 3 = a 3 + b 3
c 4 = a 4 + b 4
However, in most programming languages one typically writes loops that sequentially perform additions of many numbers.
Here is an example of such a loop, written in C:
for (i=0; i<n; i++)
c[i] = a[i] + b[i];
A vectorizing compiler transforms such loops into sequences of vector operations.
These vector operations perform additions on blocks of elements from the arrays a, b and c.
Automatic vectorization is a major research topic in computer science.[citation needed]
And I hope we can elevate (or maybe that is continue to elevate ? ) the language syntax to higher abstraction layers using mathematical notation, while simultaneously keeping the super fast and efficient Automatic vectorization ** gears hidden/encapsulated while also achieving this >> Automatic vectorization - Wikipedia (HPC via SIMD or SPMD hardware):
So I also support syntax for Function Composition (computer science) described here >>
Mostly because I want to be able to easily mutate the order of functions operations, but also
“The ability to easily compose functions encourages factoring (breaking apart) functions for maintainability and code reuse. More generally, big systems might be built by composing whole programs.”
Also for ML I believe it’s still important to facilitate the habit of writing Functional Composition example code to support Calculus multiple derivative notations e.g. f(g(x))’ = f’(g(x)) * g’(x) ( say using ApproxFun.jl ) so we can most easily write the XGBoost custom loss functions here for Machine Learning like other boosting methods do, where they generalize them by allowing optimization of an arbitrary differentiable loss function as Per >> Gradient boosting - Wikipedia
I also believe we’ll have to get the math notation compact and expressive as possible to hide/encapsulate complexity, One example of compact aka terse mathematical description facilitating a “Vectorized Fitness Function (done see below) To gain speed, vectorize your fitness function.” here >>
and here >>
HTH
Ps> Julia moves so fast there is a good chance all this is done, but never hurts to give kudos if it is.
Indeed Kudos are in order , Vectorization is done per " In Julia, vectorized functions are not required for performance, and indeed it is often beneficial to write your own loops (see Performance Tips), but they can still be convenient. Therefore, any Julia function f can be applied elementwise to any array (or other collection) with the syntax f.(A) ." @@ Functions · The Julia Language
being used to Unix pipes, i like pipeline’s clean syntax and encourages KISS philosophy of writing small functions with each doing one thing, doing it well, and use a one-liner pipe statement to express complex ideas using these small functions as building blocks. it also encourages reusability of these small functions the more you use these pipeline expressions. if you write more vertically than horizontally for your top-level workflow, maybe it is time to decompose your big functions to smaller ones and write them horizontally into pipeline expression to express the complex workflow or algorithm.
debugging is not a big deal because if you have a long pipe expression, it’s trivial to do divide and conquer in a one-liner pipe. you can just comment out half of the pipe easily to isolate the issue or break a long pipe into shorter pipes. besides, we are used to read things from left to right than top-down so it is just natural way of expressing an algo or workflow. i’m pretty satisfied to the ML pipeline which is the core functionality in my AutoMLPipeline package.
piping is not complicated at all though it’s been long discussed in Julia (post #39).
piping might be just a style but was considered one most important innovation in 2014 for R.
My case to the post question: I use piping (Chain.jl or Underscores.jl) more than do blcok in terms of frequency.
I had been using R with temporary variables in old days before piping introduced, so I can wait for a better/unified Julia implementation or package of piping.
I think I only use it in the REPL when I am checking something quickly. Outside of that, I rarely use it – maybe because I like to write stuff as close to how I would do math. By hand, I never write |> but do write out function compositions a lot by hand, and it helps me to write out code in a similar fashion.