Yes, but (a; b)
is horrible. That should be syntax for FrankenTuples.
Syntax appears to be free for FrankenTuple
definition I think the first step would be getting the type into Base
.
julia> :( (a,; b=1) ).args
2-element Vector{Any}:
:($(Expr(:parameters, :($(Expr(:kw, :b, 1))))))
:a
Is that a major problem for #24990? If that’s the case, why not _1
, _2
, … for multiple arguments?
No, that example is not a problem for #24990; it’s a problem for Chain.jl.
I have learned two things from this long saga of threads on chaining:
- Generic partial application is really hard to be consistent and correct in all cases, and will almost certainly not be coming to Julia anytime soon.
-
Chain.jl
is pretty darn good and does indeed cover the majority of use-cases
So with that being said, I’d love to throw one more proposal into the wind, which is just “Chain But Not A Macro” and it introduces the block chain ... end
, which I view as a sister to do ... end
with the following differences from Chain.jl
. I know this is redundant with my comment above but just repeating with more thoughts added
-
chain ... end
always returns a function, and the input is accessed as a_
like any other line. That is,@chain df begin transform(...) end
becomesdf |> chain transform(_, ...) end
- explicit underscores are needed at every line to avoid arguments over whether first or last position is more important
- semicolons
;
can be used instead of newline to continue the chain
This gives some limited power for partial application inside chain
blocks without opening up the whole can of worms that are the issues outlined in #24990. I think of more like a convenient new syntax for some function definitions than as wholly new machinery.
For example, here is one thing you can do with these changes more easily than current Chain.jl
since it takes only a single line and the underscore can come in the first line
l2norm = chain _.^2; sum; sqrt end
Where now l2norm
is a function.
And if you really want, it can indeed be used for partial application like
filter(chain _%3==0 end, arr)
, although I’m not sure this is better than the ->
syntax. I suppose it avoids the choice of a variable name.
I must agree with @Sukera that we are possibly starting to go in circles, although I do not share the pessimism towards all the proposals, so this will probably be my last thought on the issue
Btw, that’s already l2norm = @f __.^2 |> sum |> sqrt
with DataPipes (:
This is categorically incorrect. Partial application is very simple and straightforward.
What becomes difficult, is when it’s desired to cascade multiple function calls in sequence. Most of the controversy in #24990 has revolved around how to use _
to satisfy this desire, of building “quick lambdas” (i.e., not partial application), and to do it at the parser level. Unsurprisingly, that’s difficult.
However, as has been explored here (and expanded upon here and here), it works quite handsomely when paired with function composition. The relevant discussion is here.
That depends on how rational the Julia community can be. It seems that circling the wagons to make underscores behave as they do in Chain.jl has a tendency to whip people up into a fervor, making things difficult
By my proposal, l2norm = {it.^2, sum, sqrt}
, and if PR#24990 is accepted, l2norm = {_.^2, sum, sqrt}
, so I don’t see where your proposal adds any utility (other than defending your persistent desire to use _
as a stand-in for it
).
Then I shall draw your attention back to this:
By my proposal,
l2norm = {it.^2, sum, sqrt}
Personally, I kind of like the {}
syntax too. But chain ... end
feels closer to the existing patterns in Julia. If { ... }
and chain ... end
are synonyms then I do not really have a preference over them.
defending your persistent desire to use
_
as a stand-in forit
I’m sorry, possibly it’s just pure aesthetic subjectivity, but I really think that it
looks rather inelegant. I don’t like how the code reads; it makes me feel like I’m playing one of these
using an underscore just makes a lot more sense as a placeholder variable to me. It doesn’t help that it
is likely frequently used as a variable name for e.g. iterands
it’s just pure aesthetic subjectivity, but I really think that
it
looks rather inelegant. I don’t like how the code reads
We are making progress! I'm not averse to subjectivity, as long as I understand where it's coming from.
In the debate of Bayesianism versus Frequentism, I fall into the Bayesian camp. Try as we might to find objectivity, we never quite get there. The pursuit of objectivity has, of course, proven generally worthwhile, but ultimately we fool ourselves if we do not acknowledge that every view we hold and every decision we make is subjective, a result of conjugating our observations with our priors along with a mishmash of heuristics and animal instincts.
While I have no sympathy for the masses who misuse it
for iterators (they should be using itr
), I do empathize with this sentiment. it
requires you to dot your i’s and cross your t’s
It’s difficult to convince me that _
should be spent on chains, considering that its deprecation as an rvalue
makes it a perfect fit for partial application—an idiom which is quite common and useful, and for which it’s also already used in Scala (so there is precedent). Underscores are such a perfect fit for partial application syntax, that the fit is far better than OJ’s glove. Since we are at liberty to select our own keywords within the context we are creating, it feels wasteful indeed to lay claim to the underscore.
You’ve previously raised the possibility of ⬚
instead of it
, and my pushback has been that it
a) perfectly carries the desired meaning and b) is more readily accessible in ASCII characters. However, on further thought, I think it’s feasible (and Julian) to make them synonyms, in the same way that ∈
and in
are synonyms, as are ≤
and <=
. (Infact, I use the Unicode characters so much that I momentarily forgot what <=
means )
So I raise this possibility: to keep the meanings for it
and them
to be local keywords as proposed, and to make ⬚
a synonym for it
, and ⬚s
(plural of ⬚
) a synonym for them
. I’d love to hear your thoughts.
True, but I cannot put this inside an argument
map(@f __.^2, [1,2,3]) #error
map(chain _.^2 end, [1,2,3]) #[1, 4, 9]
Of course in simple examples like these the ->
seems obviously better. But I could imagine making this multi-line
map(myarray) do chain
...
end
map(@f __.^2, [1,2,3]) #error
As a regular macro, I think you can write that like
map(@f(__.^2), [1,2,3])
It’s a little known feature that you can “call” macros like that, to explicitly pass their argument expressions instead of inferring them from the parsing rules.
But I could imagine making this multi-line
map(myarray) do chain ... end
This is a common usecase in data processing, and DataPipes aims to make it convenient!
The __
placeholder as the inner function argument starts the inner pipe. Eg:
@p map([[1,2], [3]]) do __
map(_ + 1)
__.^2
sum
end
We’re getting off-topic again, so I’ll leave this here.
@adienes again this question for you:
So I raise this possibility: to keep the meanings for
it
andthem
to be local keywords as proposed, and to make⬚
a synonym forit
, and⬚s
(plural of⬚
) a synonym forthem
. I’d love to hear your thoughts.
Doing Advent Of Code today, I remembered another issue with the curly braces - they are currently used in the Base.Cartesian
module for some expansions.
julia> @macroexpand Base.Cartesian.@nexprs 4 i -> r_{i+10} = i^2
quote
r_11 = 2
r_12 = 4
r_13 = 9
r_14 = 16
end
So not breaking that should be kept in mind
r_{i+10}
Text like this turns into a :curly
expression, similar to regular type parameterizations, whereas I scan for :braces
and :bracescat
expressions so it’s not a problem. And text like T{X,Y} where {X,Y}
gets turned into a :where
expression: T{X,Y}
remains a :curly
argument, and {X,Y}
simply become arguments :X
and :Y
to :where
. (Confusing indeed ).
All to say, my code doesn’t break that, nor does it break type parameterization. I think my code does however break PGFPlotsX.jl, but the eventual hope would be that chaining functionality would be incorporated into the language where it executes after user macro expansion.
To me the broadcast syntax is an issue, presumably this would be used on collections a lot, and having to change the order of things or the syntax to broadcast looks quite arcane to me
So I’ve been giving this further thought, and I’ve come to disagree (with you and with my previous self). Yes, broadcasting is super awkward with this syntax, but that’s not the issue.
Broadcasting isn’t the correct idiom here.
Broadcasting is a superpower when you have a function being applied to several arguments of which zero or more could be AbstractArray
s of various dimensions, and it automagically finds the common dimensions, fills the under-sized objects out to the common size, and applies the function to the individual elements. Expressions such as y̲ = 𝐀*x̲ .+ b
are soooo much better.
Broadly, the type of object I anticipate {}
method chains to be most useful for is not the same type of object that broadcast
is best suited for. Whereas broadcasting is best for rectangularly-arranged collections of numbers, chaining is best for … just about any odd thing else: String
s, DataFrame
s, FrankenTuple
s, FlexiGroup
s, StructArray
s, etc. When managing collections of these sorts, your mind should naturally go toward filter
, map
, and reduce
.
I remember reading somewhere that broadcast
is the single-most complicated function in Julia. The magic comes at a cost: extra compile time and memory use.
So although there’s some overlap in the uses of broadcast
and map
, and although the .
dot function call syntax sugar often nudges us toward broadcasting, I think we want to avoid the concept of broadcasting these method chains and instead embrace mapping them onto collections.
I’ve found a parsing edge case.
julia> [<(2)]
1-element Vector{Base.Fix2{typeof(<), Int64}}:
(::Base.Fix2{typeof(<), Int64}) (generic function with 1 method)
julia> [1<(2)]
1-element Vector{Bool}:
1
julia> [1 <(2)]
1-element Vector{Bool}:
1
julia> [1 (<(2))]
1×2 Matrix{Any}:
1 Fix2{typeof(<), Int64}(<, 2)
This is relevant to my proposal for multi-chains: if you use one of these curried binary operators within multi-chains (in a non-leftmost chain), you will need to wrap it in parentheses to eliminate ambiguity.
I remembered another issue with the curly braces
So I found a less-obscure macro which my code breaks: @NamedTuple
(useful when you want to declare fields with abstract types instead of concrete types, which causes type-instability so it isn’t popular anyway, but still).
Although as stated previously, the hope would be that my proposal runs as a language feature after macro expansion so it wouldn’t be an issue.
Update: The code now detects if there are any closures within the chain, and if they capture it
or them
then it wraps them in let it=it...end
or let them=them...end
blocks. Because the values and types of it
and them
change line-by-line anyway, there isn’t a point to capturing anything more than their value at that snapshot in time, so this new behavior avoids type-instability.
(This causes the same behavior we would have had if I had replaced the it
and them
identifiers with gensym
symbols, but keeping the same identifier and wrapping closures in let
blocks is easier and keeps the emitted code cleaner).
Update: Implemented experimental Unicode pronoun synonyms: ⬚
for it
, and ⬚s
for them
. Within chains, I run a recursive search-and-replace to convert ⬚
to it
and ⬚s
to them
. This is experimental and I could easily change my mind on what synonyms are best.
Update: Improved behavior for local variable declaration (previously, only variables declared as direct descendent expressions of the chain would be made local
; now a recursive search is made, and any assignments become local
to the chain-scope).
Update: made multi-chains return them
if the final line is not a single chain. I had previously required them
to be explicit in order to avoid “DWIM” behavior. However, this new behavior is generally more useful and is non-ambiguous anyway. For example, with this new behavior, we can write:
julia> xrange = range(1, 3, 10^4)
a, b, dx = xrange.{first last step}
(1.0, 3.0, 0.00020002000200020003)
This allows us to avoid things like
a, b, dx = first(xrange), last(xrange), step(xrange)
To illustrate the behaviors now:
julia> (1).{tuple, tuple, tuple}
(((1,),),)
julia> (1).{tuple; tuple; tuple}
(((1,),),)
julia> (1).{tuple tuple tuple}
((1,), (1,), (1,))
I’m debating whether I want to keep the comma-delimited variant’s behavior like the semicolon-delimited variant (as a single-chain with multiple steps), or to change it to be like the space-delimited variant (as a single-depth multichain). The more I think about it, the more I’m leaning towards changing it.
Update: changed chain self-reference from loop
to recurse
. The name loop
is reasonably likely to be used, so it’s somewhat unattractive to claim it. However, within any program, recurse
is virtually guaranteed to be a free identifier, because functions refer to themselves by their actual name. Additionally, recurse
is more informative. So for example, you might now write:
let fib={it ≤ 1 ? it : recurse(it-1) + recurse(it-2)}; fib(10) end
Considering banning the return
keyword. The reason for considering whether to ban return
is because the behavior of this keyword is very inconsistent between the headless chainlink (which is a function) and the regular chain (which is a let
statement). When used as part of a regular chain, nested in a larger function, return
will terminate the outer function; this is not true of a chainlink.
That said, return
is still useful for recursive chainlinks, so I’m undecided.
I pulled the trigger. x.{f, g, h}
now means (f(x), g(x), h(x))
.
Thus:
x.{f; g} == g(f(x))
x.{f} == f(x)
x.{f, g} == (f(x), g(x))
I think this satisfies the use cases of chaining best. I’m not happy with this (the parser doesn’t distinguish between them):
x.{f,} == x.{f}
but I think it’s worth it for this:
let (min, max, med, μ) = data.{minimum, maximum, median, mean};
#= stuff =#
end
Can’t help but feel that this syntax is redundant with broadcasted piping, i.e.
julia> lo, hi, med, μ = Ref(data) .|> (minimum, maximum, median, mean)
(1, 4, 2.5, 2.5)