In the sprit of this question and this related question I was wondering why the single argument version of Base.filter(f) = Fix1(filter, f) exists, but the equivalent Base.filter!(f) and Iterators.filter(f) are missing.
Is Base.filter!(f) missing because the resulting filter can only be applied to Arrays, but at the point of calling Base.filter!(f) we don’t know what the result will be called on?
I am puzzled by the converse: why people feel compelled to include
f(g) = Base.Fix1(f, g)
for various f.
Base is now full of these “convenience” puns, but the general strategy is unclear. Do all f get these? or only some? what are the criteria? or is it simply that someone asked for it?
I also thought about this before! My impression is that there are no precise criteria; if a predicate is expected to be used frequently, it gets a Fix1 version so that it can be conveniently used as the first argument of filter, count, etc.
I think it is bad strategy for the following reason: if you are writing generic code when you cannot count on input functions having partially applied versions, you will have to use Fix1 anyway. But of course it is easy to slip into the habit of not doing so, resulting in subtle bugs.
Some languages do this built-in (eg Haskell). It is not clear to me that the convenience is worth it, especiallly now (now = 1.12 ) that Base.Fix{N} is part of the API.
My use case is in fact the one mentioned by @barucden : Providing first arguments for filter, map, count etc as well as putting things into |> pipelines. The concrete question about Iterators.filter() arose in fact from a |> pipeline in which I wanted to filter.
IMHO a good solution for all of this would be to get something like https://github.com/JuliaLang/julia/pull/24990 into the language, but from the discussion on that PR it doesn’t sound like that will happen anytime soon.
Agreed. However, from time to time, I need to write not-so-clever scripts where I appreciate the convenience of
n = count(>(5), numbers)
images = filter(endswith(".png"), filenames)
In such cases, I am clearly not aiming for generic code, and I also like that I don’t have to define gt5 = Base.Fix2(>, 5) or endswithpng = Base.Fix2(endswith, ".png").
I get your general point — someone decides which functions deserve an extra single-argument method, and this decision is “arbitrary”. But I have to say that I am generally satisfied with the selection made for Julia’s Base.
A simpler approach is to continue special-casing the first argument. It’s currently used in the do block that “creates an anonymous function … and passes the anonymous function as the first argument to the “outer” function”.
A natural extension is a |> f(b,c) being equivalent to f(a,b,c). This is similar to Python’s “.-operator”: "hello" . upper(). I know the dot isn’t supposed to be surrounded by spaces, but this is valid syntax, and the Julia code basically uses |> instead of .. This immediately leads to “flowing pipes” like df |> select(:Date, :Price) |> filter(col(:Date) > Date(2000,1,1)) |> sort(:Date) and nice OOP-like syntax.
Searching the forums reveals a good number of discussions pertaining syntax extension for piping/chaining as well as partial application.
The recurring theme in all of them seems to be, that people have wildly different opinions on these and thus these discussions never reached anything even just remotely close to consensus…
In that vain - I don’t really see how a |> f(b,c) is a
to creating anonymous functions via the do keyword…
Piping multiple data manipulation functions is such a common scenario, that there’s DataPipes.jl for the exact purpose of making them boilerplate-free
Examples from this thread rewritten using DataPipes:
The “just use closures” argument doesn’t mesh for me. It is way noisier than the curried alternative, so now, I’m reading code and need to parse a function definition on top of shifting back and forth to understand context.
count(x -> x > 5, numbers) reads as "count up elements [jump to end] of an array of numbers [jump to middle] where we get true returned from a function x>5 where x is the current element.
numbers |> count(>(5)) reads as “take an array of numbers and count up the elements that are greater than 5.”
It’s no easier to write one or the other, but since code is read many more times than it is written, reducing the noisiness of code isn’t so much a convenience as it is a courtesy to future readers (author included!).
Edit: this is also why I don’t think piping packages is a reasonable solution. I don’t want a reader to have to learn the particular rules of whatever piping package I happened to choose. I’d much rather use a solution from Base that everyone knows (even if they don’t choose to use it themselves).
I agree that having a great solution in Base is better than in a package. It’s just wise IMO to try out any major syntax in a package before including to Base.
For Julia data manipulation functions, DataPipes.jl @p macro serves as the only (to my knowledge) solution that removes all boilerplate present there. I’m personally fine with it being a package, but the inclusion of such a macro into Julia (with the same or similar semantics) could be a nice thing.
Fundamentally, there’s only so much one can do with those Base.Fix overloads. count(>(5), numbers) is fine when it’s applicable, but even count(x -> x.age > 5, numbers) cannot be written this way.
Sorry! Should have said “final solution”. I use piping packages all the time and really appreciate the thought you and other authors have put into the API!
I am not sure this position is very consistent — closures are a solution that is part of the language, and everyone knows the concept. Yet you don’t like them for some reason so they are not OK.
Also, whenever you use a package outside Base, you are forcing your reader to learn a bit about that package, regardless of whether you are using functions like Statistics.mean or macros like DataPipes.@p. Saying that one is acceptable while the others is not is rather arbitrary.
I see both points as flowing from the same thesis. After correctness and performance, readability ought to be the guiding principle of code. Anonymous functions disrupt the connection between reading order and logic order. I think piping + currying syntax is superior in this regard. AND I don’t think that the syntax should ultimately be provided by a package because, with multiple options around, package implementations add more mental load trying to track what is happening.
Language feature packages are different from functionality libraries. When I load Statistics.mean it still obeys all the semantics of the language, so if I know what the name of the function means (pun intended ), there’s no ambiguity.
With the different piping packages, you have to remember: Does this insert in the first or last position? What does the underscore mean? What are these new keywords? etc.
“Reading order” does not seem like a well defined term, or rather, the interpretation comes down to personal taste/habit/native language. Sort of like big vs little-endian.
Personally I’d read that as just “count all elements greater than 5 in numbers”
IMO using an anonymous function is a bad practice anyway, especially an inline one, for package code. So I don’t see this as a relevant example. Introducing additional syntax sugar like you propose would just make things worse.
> has a curried version, so count(>(5), numbers) would work in this particular case. In other cases, I guess make your own curried version?
These sound like arguments to converge on a single semantics for piping, arguably best in a package first – even if there’s desire to include it into Base eventually.
In this context, I wonder if you see something done suboptimally in DataPipes.jl – feedback always welcome The main goal of its design is to have boilerplate-free generic data manipulation in Julia with simple syntax transformation rules. IMO, this naturally lead to _ indicating the lambda function argument, and inserting the previous result in the last position by default.
???
Can you elaborate on that a bit?
Anonymous functions are heavily used in Julia code, what alternative do you suggest? I don’t know of an easy way to avoid them in code like map(x -> abs(x.val)) or filter(x -> first(x.dates) == mydate).