Fixing the Piping/Chaining Issue (Rev 3)

Syntax appears to be free for FrankenTuple definition :wink: I think the first step would be getting the type into Base.

julia> :( (a,; b=1) ).args
2-element Vector{Any}:
 :($(Expr(:parameters, :($(Expr(:kw, :b, 1))))))
 :a

Is that a major problem for #24990? If that’s the case, why not _1, _2, … for multiple arguments?

No, that example is not a problem for #24990; it’s a problem for Chain.jl.

I have learned two things from this long saga of threads on chaining:

  1. Generic partial application is really hard to be consistent and correct in all cases, and will almost certainly not be coming to Julia anytime soon.
  2. Chain.jl is pretty darn good and does indeed cover the majority of use-cases

So with that being said, I’d love to throw one more proposal into the wind, which is just “Chain But Not A Macro” and it introduces the block chain ... end, which I view as a sister to do ... end with the following differences from Chain.jl. I know this is redundant with my comment above but just repeating with more thoughts added

  1. chain ... end always returns a function, and the input is accessed as a _ like any other line. That is, @chain df begin transform(...) end becomes df |> chain transform(_, ...) end
  2. explicit underscores are needed at every line to avoid arguments over whether first or last position is more important
  3. semicolons ; can be used instead of newline to continue the chain

This gives some limited power for partial application inside chain blocks without opening up the whole can of worms that are the issues outlined in #24990. I think of more like a convenient new syntax for some function definitions than as wholly new machinery.

For example, here is one thing you can do with these changes more easily than current Chain.jl since it takes only a single line and the underscore can come in the first line

l2norm = chain _.^2; sum; sqrt end

Where now l2norm is a function.

And if you really want, it can indeed be used for partial application like
filter(chain _%3==0 end, arr), although I’m not sure this is better than the -> syntax. I suppose it avoids the choice of a variable name.

I must agree with @Sukera that we are possibly starting to go in circles, although I do not share the pessimism towards all the proposals, so this will probably be my last thought on the issue :slight_smile:

4 Likes

Btw, that’s already l2norm = @f __.^2 |> sum |> sqrt with DataPipes (:

1 Like

This is categorically incorrect. Partial application is very simple and straightforward.

What becomes difficult, is when it’s desired to cascade multiple function calls in sequence. Most of the controversy in #24990 has revolved around how to use _ to satisfy this desire, of building “quick lambdas” (i.e., not partial application), and to do it at the parser level. Unsurprisingly, that’s difficult.

However, as has been explored here (and expanded upon here and here), it works quite handsomely when paired with function composition. The relevant discussion is here.

That depends on how rational the Julia community can be. It seems that circling the wagons to make underscores behave as they do in Chain.jl has a tendency to whip people up into a fervor, making things difficult :sweat_smile:

By my proposal, l2norm = {it.^2, sum, sqrt}, and if PR#24990 is accepted, l2norm = {_.^2, sum, sqrt}, so I don’t see where your proposal adds any utility (other than defending your persistent desire to use _ as a stand-in for it).

Then I shall draw your attention back to this:

1 Like

Personally, I kind of like the {} syntax too. But chain ... end feels closer to the existing patterns in Julia. If { ... } and chain ... end are synonyms then I do not really have a preference over them.

defending your persistent desire to use _ as a stand-in for it

:sweat_smile: I’m sorry, possibly it’s just pure aesthetic subjectivity, but I really think that it looks rather inelegant. I don’t like how the code reads; it makes me feel like I’m playing one of these

using an underscore just makes a lot more sense as a placeholder variable to me. It doesn’t help that it is likely frequently used as a variable name for e.g. iterands

7 Likes
We are making progress! I'm not averse to subjectivity, as long as I understand where it's coming from.

In the debate of Bayesianism versus Frequentism, I fall into the Bayesian camp. Try as we might to find objectivity, we never quite get there. The pursuit of objectivity has, of course, proven generally worthwhile, but ultimately we fool ourselves if we do not acknowledge that every view we hold and every decision we make is subjective, a result of conjugating our observations with our priors along with a mishmash of heuristics and animal instincts.

While I have no sympathy for the masses who misuse it for iterators (they should be using itr), I do empathize with this sentiment. it requires you to dot your i’s and cross your t’s :sweat_smile:

It’s difficult to convince me that _ should be spent on chains, considering that its deprecation as an rvalue makes it a perfect fit for partial application—an idiom which is quite common and useful, and for which it’s also already used in Scala (so there is precedent). Underscores are such a perfect fit for partial application syntax, that the fit is far better than OJ’s glove. Since we are at liberty to select our own keywords within the context we are creating, it feels wasteful indeed to lay claim to the underscore.

You’ve previously raised the possibility of instead of it, and my pushback has been that it a) perfectly carries the desired meaning and b) is more readily accessible in ASCII characters. However, on further thought, I think it’s feasible (and Julian) to make them synonyms, in the same way that and in are synonyms, as are and <=. (Infact, I use the Unicode characters so much that I momentarily forgot what <= means :sweat_smile:)

So I raise this possibility: to keep the meanings for it and them to be local keywords as proposed, and to make a synonym for it, and ⬚s (plural of ) a synonym for them. I’d love to hear your thoughts.

1 Like

True, but I cannot put this inside an argument

map(@f __.^2, [1,2,3]) #error
map(chain _.^2 end, [1,2,3]) #[1, 4, 9]

Of course in simple examples like these the -> seems obviously better. But I could imagine making this multi-line

map(myarray) do chain
    ...
end

As a regular macro, I think you can write that like

map(@f(__.^2), [1,2,3])

It’s a little known feature that you can “call” macros like that, to explicitly pass their argument expressions instead of inferring them from the parsing rules.

2 Likes

This is a common usecase in data processing, and DataPipes aims to make it convenient!
The __ placeholder as the inner function argument starts the inner pipe. Eg:

@p map([[1,2], [3]]) do __
	map(_ + 1)
	__.^2	
	sum
end

We’re getting off-topic again, so I’ll leave this here.

@adienes again this question for you:

Doing Advent Of Code today, I remembered another issue with the curly braces - they are currently used in the Base.Cartesian module for some expansions.

julia> @macroexpand Base.Cartesian.@nexprs 4 i -> r_{i+10} = i^2
quote
    r_11 = 2
    r_12 = 4
    r_13 = 9
    r_14 = 16
end

So not breaking that should be kept in mind :slight_smile:

1 Like

Text like this turns into a :curly expression, similar to regular type parameterizations, whereas I scan for :braces and :bracescat expressions so it’s not a problem. And text like T{X,Y} where {X,Y} gets turned into a :where expression: T{X,Y} remains a :curly argument, and {X,Y} simply become arguments :X and :Y to :where. (Confusing indeed :sweat_smile:).

All to say, my code doesn’t break that, nor does it break type parameterization. I think my code does however break PGFPlotsX.jl, but the eventual hope would be that chaining functionality would be incorporated into the language where it executes after user macro expansion.

So I’ve been giving this further thought, and I’ve come to disagree (with you and with my previous self). Yes, broadcasting is super awkward with this syntax, but that’s not the issue.

Broadcasting isn’t the correct idiom here.

Broadcasting is a superpower when you have a function being applied to several arguments of which zero or more could be AbstractArrays of various dimensions, and it automagically finds the common dimensions, fills the under-sized objects out to the common size, and applies the function to the individual elements. Expressions such as y̲ = 𝐀*x̲ .+ b are soooo much better.

Broadly, the type of object I anticipate {} method chains to be most useful for is not the same type of object that broadcast is best suited for. Whereas broadcasting is best for rectangularly-arranged collections of numbers, chaining is best for … just about any odd thing else: Strings, DataFrames, FrankenTuples, FlexiGroups, StructArrays, etc. When managing collections of these sorts, your mind should naturally go toward filter, map, and reduce.

I remember reading somewhere that broadcast is the single-most complicated function in Julia. The magic comes at a cost: extra compile time and memory use.

So although there’s some overlap in the uses of broadcast and map, and although the . dot function call syntax sugar often nudges us toward broadcasting, I think we want to avoid the concept of broadcasting these method chains and instead embrace mapping them onto collections.

1 Like

I’ve found a parsing edge case.

julia> [<(2)]
1-element Vector{Base.Fix2{typeof(<), Int64}}:
 (::Base.Fix2{typeof(<), Int64}) (generic function with 1 method)

julia> [1<(2)]
1-element Vector{Bool}:
 1

julia> [1 <(2)]
1-element Vector{Bool}:
 1

julia> [1 (<(2))]
1×2 Matrix{Any}:
 1  Fix2{typeof(<), Int64}(<, 2)

This is relevant to my proposal for multi-chains: if you use one of these curried binary operators within multi-chains (in a non-leftmost chain), you will need to wrap it in parentheses to eliminate ambiguity.


So I found a less-obscure macro which my code breaks: @NamedTuple (useful when you want to declare fields with abstract types instead of concrete types, which causes type-instability so it isn’t popular anyway, but still).

Although as stated previously, the hope would be that my proposal runs as a language feature after macro expansion so it wouldn’t be an issue.


Update: The code now detects if there are any closures within the chain, and if they capture it or them then it wraps them in let it=it...end or let them=them...end blocks. Because the values and types of it and them change line-by-line anyway, there isn’t a point to capturing anything more than their value at that snapshot in time, so this new behavior avoids type-instability.

(This causes the same behavior we would have had if I had replaced the it and them identifiers with gensym symbols, but keeping the same identifier and wrapping closures in let blocks is easier and keeps the emitted code cleaner).


Update: Implemented experimental Unicode pronoun synonyms: for it, and ⬚s for them. Within chains, I run a recursive search-and-replace to convert to it and ⬚s to them. This is experimental and I could easily change my mind on what synonyms are best.


Update: Improved behavior for local variable declaration (previously, only variables declared as direct descendent expressions of the chain would be made local; now a recursive search is made, and any assignments become local to the chain-scope).

1 Like

Update: made multi-chains return them if the final line is not a single chain. I had previously required them to be explicit in order to avoid “DWIM” behavior. However, this new behavior is generally more useful and is non-ambiguous anyway. For example, with this new behavior, we can write:

julia> xrange = range(1, 3, 10^4)
       a, b, dx = xrange.{first last step}
(1.0, 3.0, 0.00020002000200020003)

This allows us to avoid things like

a, b, dx = first(xrange), last(xrange), step(xrange)

To illustrate the behaviors now:

julia> (1).{tuple, tuple, tuple}
(((1,),),)

julia> (1).{tuple; tuple; tuple}
(((1,),),)

julia> (1).{tuple tuple tuple}
((1,), (1,), (1,))

I’m debating whether I want to keep the comma-delimited variant’s behavior like the semicolon-delimited variant (as a single-chain with multiple steps), or to change it to be like the space-delimited variant (as a single-depth multichain). The more I think about it, the more I’m leaning towards changing it.


Update: changed chain self-reference from loop to recurse. The name loop is reasonably likely to be used, so it’s somewhat unattractive to claim it. However, within any program, recurse is virtually guaranteed to be a free identifier, because functions refer to themselves by their actual name. Additionally, recurse is more informative. So for example, you might now write:

let fib={it ≤ 1 ? it : recurse(it-1) + recurse(it-2)};  fib(10)  end

Considering banning the return keyword. The reason for considering whether to ban return is because the behavior of this keyword is very inconsistent between the headless chainlink (which is a function) and the regular chain (which is a let statement). When used as part of a regular chain, nested in a larger function, return will terminate the outer function; this is not true of a chainlink.

That said, return is still useful for recursive chainlinks, so I’m undecided.

2 Likes

I pulled the trigger. x.{f, g, h} now means (f(x), g(x), h(x)).

Thus:

x.{f; g} == g(f(x))

x.{f} == f(x)
x.{f, g} == (f(x), g(x))

I think this satisfies the use cases of chaining best. I’m not happy with this (the parser doesn’t distinguish between them):

x.{f,} == x.{f}

but I think it’s worth it for this:

let (min, max, med, μ) = data.{minimum, maximum, median, mean};
    #= stuff =#
end

Can’t help but feel that this syntax is redundant with broadcasted piping, i.e.

julia> lo, hi, med, μ = Ref(data) .|> (minimum, maximum, median, mean)
(1, 4, 2.5, 2.5)

Good point—but quite a bit of this proposal is already redundant with |> and , but with better ergonomics and lower compiler strain.

And let’s not forget the urge which inspired this whole saga, which is to provide a proper chaining syntax for autocomplete hints, esp. for multi-arg functions.