Dotdot: The double-broadcast operator enabling Float32..(a)

Programming Julia for years, I have always missed a way to use double-broadcasting.
Ref is preventing a broadcast, but often one wants to also apply a broadcast operation to each element of an iterable. This can be done by encapsulating one of the two broadcasts in an (anonymous) function:

julia> a = [reshape(1:4,(2,2)) for _=1:3]; broadcast((x)->Float32.(x), a)
3-element Vector{Matrix{Float32}}:
 [1.0 3.0; 2.0 4.0]
 [1.0 3.0; 2.0 4.0]
 [1.0 3.0; 2.0 4.0]

But this feels awkward and one often wishes to have access to a .. operator.
So I had a go at it:

function (..)(f, nargs...)
    mydot(nargs2...)=f.(nargs2...)
    mydot.(nargs...)
end

a = [reshape(1:4,(2,2)) for _=1:3]; Float32..(a)
3-element Vector{Matrix{Float32}}:
 [1.0 3.0; 2.0 4.0]
 [1.0 3.0; 2.0 4.0]
 [1.0 3.0; 2.0 4.0]

I did not quite expected that this ends up to be that simple, but for some (great!) unknown reason the .. notation automatically worked in infix notation.

Can we not use this implementation as a general addition to the Julia Language? I am more than happy to write a PR (with help-file and tests), if agreed.
Of course there are many more applications than just this casting every element of arrays in a collection. For example adding a Tuple to each array of Tuples is another common use-case.

julia> a = [(1,2),(3,4),(5,6)]; ..(+,a,Ref((3,3)))
3-element Vector{Tuple{Int64, Int64}}:
 (4, 5)
 (6, 7)
 (8, 9)

Maybe there is even a way to enable the notation a ..+ Ref((3,3))?
What do you think?

See also this somewhat related thread:

[why doesn't `@.` always broadcast?]

there are a number of places where .. is already being used for some kind of a range (it’s the same precedence as :), like uniform distributions in MonteCarloMeasurements.

we already have a number of operators which can’t even use a single dot (:= $= ?: in isa : .. $ :: . ' ... -> ,) because it would be to easy to confuse it with something else, and ..$ notation would only make this problem worse. if anything, I would prefer to see the second dot added on top, like :$, but that’s already notation for creating a symbol.

often, I find that when I start wanting a second distribution, it’s really a better idea to just bring one layer out into an explicit loop which is more readable anyways:
[Float32.(A) for A=a] and
[A.+r for A=a, r=Ref((3,3))] or Ref(A .+ r for A=a, r=Ref((3,3)))

1 Like

You can use a macro @.., akin to @..

julia> macro (..)(args)
           args
       end
@.. (macro with 1 method)

julia> @.. 1+2
3
2 Likes

To add, someone is going to ask how to do triple broadcasting, and ... is taken. There’s nothing particular about double broadcasting to stop there. Julia does have Base.BroadcastFunction(op) to represent a broadcasted operator. We can just nest those, though I’d rather not write that repeatedly.

julia> fiter(f, n) = reduce(∘, Iterators.repeated(f, n))
fiter (generic function with 1 method)

julia> bcn(op, n) = fiter(Base.BroadcastFunction, n)(op)
bcn (generic function with 1 method)

julia> bcn(Float32, 2)
Base.Broadcast.BroadcastFunction(Base.Broadcast.BroadcastFunction(Float32))

julia> a = [reshape(1:4,(2,2)) for _=1:3]; bcn(Float32, 2)(a)
3-element Vector{Matrix{Float32}}:
 [1.0 3.0; 2.0 4.0]
 [1.0 3.0; 2.0 4.0]
 [1.0 3.0; 2.0 4.0]

Repeated function composition is not type-stable because the type depends on the runtime number of calls. The straightforward Val tweak doesn’t work, the type inference basically stops after the first composition. If that could be made type-stable, that’d be a nice package or addition to Julia.

A macro could also just paste the code in because they do take Int literals. It looks real odd though, not nearly as neat as overhauling the parser for even more dots:

julia> macro ..(n::Int, op)
         ex = :(Base.BroadcastFunction($op))
         for _ in 2:n
           ex = :(Base.BroadcastFunction($ex))
         end
         esc(ex)
       end
@.. (macro with 1 method)

julia> @..(2, Float32)
Base.Broadcast.BroadcastFunction(Base.Broadcast.BroadcastFunction(Float32))

julia> a = [reshape(1:4,(2,2)) for _=1:3]; @..(2, Float32)(a)
3-element Vector{Matrix{Float32}}:
 [1.0 3.0; 2.0 4.0]
 [1.0 3.0; 2.0 4.0]
 [1.0 3.0; 2.0 4.0]
4 Likes

I can see that point. But practically speaking, I have ā€œneededā€ the double broadcast many times, but I never needed the triple broadcast as far as I remember.

not sure I get it. The macro seems to do nothing. Or am I missing a point here?

I guess this is real problem, if there are potential name clashes. But even a normal name such as dotdot would be helpful to provide this functionality.

If you just need double, an alias for Base.BroadcastFunction would help:

julia> const bcf = Base.BroadcastFunction
Base.Broadcast.BroadcastFunction

julia> a = [reshape(1:4,(2,2)) for _=1:3]; bcf(Float32).(a)
3-element Vector{Matrix{Float32}}:
 [1.0 3.0; 2.0 4.0]
 [1.0 3.0; 2.0 4.0]
 [1.0 3.0; 2.0 4.0]

That alias has a risk of clashing with another bcf the same way .. clashes, though I think the risk is lower in practice. The parser would have to be overhauled for an infix syntax either way, which would be too far for just double broadcasting.

2 Likes

technically you can also do this with the postfix unary operator:

julia> var"'²"(a) = Base.BroadcastFunction(Base.BroadcastFunction(a))
'² (generic function with 1 method)

julia> Float32'²(a)
3-element Vector{Matrix{Float32}}:
 [1.0 3.0; 2.0 4.0]
 [1.0 3.0; 2.0 4.0]
 [1.0 3.0; 2.0 4.0]

but the point I was making was that if you start using a double-broadcast all over the place, you’ll pretty quickly start feeling the need for a third and so on, and there’s a surprising number of edge cases that pop up where behavior is slightly different than you would expect it to be. (speaking as the person who started that thread you linked there)

I had something similar to

var"'²"(a) = reduce(∘, Iterators.repeated(Base.BroadcastFunction, 2))(a)
var"'³"(a) = reduce(∘, Iterators.repeated(Base.BroadcastFunction, 3))(a)
var"'⁓"(a) = reduce(∘, Iterators.repeated(Base.BroadcastFunction, 4))(a)
var"'⁵"(a) = reduce(∘, Iterators.repeated(Base.BroadcastFunction, 5))(a)
var"'⁶"(a) = reduce(∘, Iterators.repeated(Base.BroadcastFunction, 6))(a)
var"'⁷"(a) = reduce(∘, Iterators.repeated(Base.BroadcastFunction, 7))(a)
var"'⁸"(a) = reduce(∘, Iterators.repeated(Base.BroadcastFunction, 8))(a)
var"'⁹"(a) = reduce(∘, Iterators.repeated(Base.BroadcastFunction, 9))(a)

in my startup file at one point, and while it was fun for a bit, it got confusing really fast
(actually I think I had it recursively defined and was doing various other dumb things as well, maybe these 'ⁿ functions would end up being nicer).

2 Likes

Yeah, sorry, that’s a no-op there. My point is just that this is available syntax for you to define to do whatever you wanted :slight_smile:

2 Likes

I like this approach. Maybe this is the most generic in some way. At least this also works:

julia> bcf(+).(a,Ref((1,2)))
3-element Vector{Matrix{Int64}}:
 [2 4; 4 6]
 [2 4; 4 6]
 [2 4; 4 6]

Thanks!
Is there a way to preserve the infix mode somehow so you can write something like

a .bcf(+) Ref(1,2)

?

You could write something like:

let +ᵇ = .+
    a .+ᵇ Ref((1, 2))
end

As you can see here, for functions that are operators, you don’t even need to define bcf, so you could write your original example as:

julia> (.+).(a, Ref((1, 2)))
3-element Vector{Matrix{Int64}}:
 [2 4; 4 6]
 [2 4; 4 6]
 [2 4; 4 6]
1 Like

No, that dubious mix of dot syntax and BroadcastFunction would need to overhaul the parser. The parser currently recognizes infix operators from a fixed set of characters and their variants, and there’s no dedicated syntax for custom infix operators like R’s % %, let alone arbitrary Julia expressions like bcf(+) that could return anything, even another operator with a different precedence like *. The problem with arbitrary infix operators is that they need an associated operator precedence, so there are 2 terrible options: 1) we can set it ourselves, in which case multiple modules would make the already hard-to-memorize precedence table unrecognizable, or 2) the operator already has a precedence we can’t control. R’s % % has a fixed precedence, but we’ll likely want bcf(+) to have the same precedence as +, in which case the parser supports dot syntax .+ instead of letting function call expressions break macro calls and array comprehensions: [a bcf(+) Ref((1,2))] would go from 3 elements to 1.

Which takes us back to the beginning. You’ve been hoping to extend dot syntax to dotdot syntax ..+, but dot syntax isn’t just for broadcasting 1 existing unary or binary function. BroadcastFunction, numpy.vectorize, R’s broadcast package, or MATLAB’s bsxfun is enough for that, though Julia’s JIT compiler does optimize higher-order function calls more easily. Dot syntax is designed for syntactic loop fusion of scalar operations where an uninterrupted chain of dots is syntactic sugar for broadcasting an implicit anonymous function: x .+ f.(y) .* g.(z) would be equivalent to bcf((x1, y1, z1) -> x1 + f(y1) * g(z1))(x, y, z) instead of bcf(+)(x, bcf(*)(bcf(f)(y), bcf(g)(z))). Although those would reach the same value, loop fusion would eliminate intermediate allocations and often save time. Let’s assume that dotdot syntax also reaches the same value with or without fusion (my intuition is not proof). That opens up a whole host of problems:

  1. What happens when we mix dot syntax with dotdot syntax, like x .* y ..- z? If we tried to fuse, we reach a point where we can’t fuse further bcf((x1, y1, z1) -> x1 * y1 .- z1)(x, y, z), only saving the small outermost fraction of allocations. For a simple example, let’s consider x = [[1, 2]]; y = [[2 1]]; z = [[1;;; 1]]; the 1-element vectors obviously intend to simplify the result to the currently runnable [[1, 2] * [2 1] .- [1;;; 1]]. We must allocate an intermediate 2x2 matrix before the following broadcast makes a 2x2x2 array, and fusing only saves a 1-element vector. To be absolutely fair, broadcasted matrix multiplication already must allocate intermediate matrices, but again, dot syntax was intended to eliminate intermediate allocations for scalar operations, not fusing for fusion’s sake.
  2. Does a Ref argument get treated as a scalar all the way, or does it only go through one level as intended by the a ..+ Ref((3,3)) example? There’s no clear preference in the general case.
  3. . vs .. typos are easy to miss visually and cannot be caught by a linter.
  4. A single request or toy example of dotdot loop fusion hasn’t turned up in this entire thread, heavily favoring the various approaches to nest BroadcastFunction over interfering with dot syntax.

Whaaat ? I had never seen that you can define additional post-fix operators :sweat_smile:
The rule for what’s parseable as a post-fix operator looks funny:

julia> var"'¹²"(x) = 2x; var"'¹₂"(x) = 2x; var"'¹2"(x) = 2x
'¹2 (generic function with 1 method)

julia> 2'¹², 2'¹₂
(4, 4)

julia> 2'¹2
ERROR: ParseError:
# Error @ REPL[46]:1:4
2'¹2
#  ā•™ ── extra tokens after end of expression
Stacktrace:
 [1] top-level scope
   @ REPL:1

I couldn’t find anything on custom post-fix operators in the documentation.

you can stick anything from opstuffs after the ' and it should work. that’s unicode combining characters, subscripts, superscripts, and primes.

the dev docs has gotten a bit of an overhaul of the precedence table, you can see it there: Mathematical Operations and Elementary Functions Ā· The Julia Language

you can also use this syntax at the call site to directly call functions that would normally parse specially:

julia> var"="(a, b) = 4
= (generic function with 1 method)

julia> var"="(1, 2)
4

julia> 1=2
ERROR: syntax: invalid assignment location "1" around REPL[3]:1
Stacktrace:
 [1] top-level scope
   @ REPL[3]:1

julia> Meta.@dump var"="(1, 2)
Expr
  head: Symbol call
  args: Array{Any}((3,))
    1: Symbol =
    2: Int64 1
    3: Int64 2

julia> Meta.@dump 1=2
Expr
  head: Symbol =
  args: Array{Any}((2,))
    1: Int64 1
    2: Int64 2

which means your last function can still be accessed like:

julia> var"'¹2"(x) = 2x
'¹2 (generic function with 1 method)

julia> var"'¹2"(5)
10
2 Likes

It’s most operators as well, and the special operator parsing has more restrictions than typical symbols. xĢ‚2 is a valid symbol (and note that is 3 characters collect("xĢ‚2") == ['x', 'Ģ‚', '2'], and the single quote is colliding with the accent to render an arrow), but -2 must be a literal value or unary operator call -(2). As with dot syntax, parsing can only go so far before it goes off the rails.

For what it’s worth, I’ve never run into suffixed operators in practice, but I have run into unusual Unicode characters that also work in non-operator symbols. Either way, there are Julia Tab completion sequences for the REPL and supported editors so we’re not forced to copy and paste every special symbol: Unicode Input Ā· The Julia Language. I wish I knew of a way to take an arbitrary Julia symbol and derive any Tab completion sequences but I’ve just been searching code points in that linked table. Here’s how for xĢ‚2:

julia> collect("x̂2")
3-element Vector{Char}:
 'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)
 'Ģ‚': Unicode U+0302 (category Mn: Mark, nonspacing)
 '2': ASCII/Unicode U+0032 (category Nd: Number, decimal digit)

That page has 5 hexadecimal digits, so I search U+00302 and find:

 Ģ‚ 	\hat	Combining Circumflex Accent / Non-Spacing Circumflex

So denoting the tab key as #=TAB=#, I type x\hat#=TAB=#2.

Just for absolute clarity, rokke is saying that var"=" has nothing to do with the specially parsed assignment symbol = in conventional Julia syntax. We can see the distinct head: Symbol = in the Expr tree:

julia> :(x=1) |> dump
Expr
  head: Symbol =
  args: Array{Any}((2,))
    1: Symbol x
    2: Int64 1

julia> :(var"=") |> dump
Symbol =

julia> :(=) |> dump # this does the latter

Which also leads to a nice simple example of eval-ing symbols and expressions not being the same as pasting the text into source code, which is part of why Julia isn’t called homoiconic anymore:

julia> eval(:(=))
ERROR: UndefVarError: `=` not defined in `Main`
...
julia> =
ERROR: ParseError:
2 Likes

idk if you’re referring to ā€œfinding existing tab completion sequencesā€ or ā€œdefining your own tab completion sequencesā€ but you can do either.

to see latex completions from base, you can paste the symbol into the help menu:

help?>
"Ģ‚" can be typed by \hat<tab>

if you want to define your own shortcuts, you can do something like:

import REPL
REPL.REPLCompletions.latex_symbols["\\^q"] = "\U107A5"

(I think I saw a pr for superscript q get added a while back so should be available soon)

you can also abuse this to add shortcuts for other things you type a lot:

REPL.REPLCompletions.latex_symbols["\\eq"] = "popcorn = 3*heat + 89/5"

and now I can type \eq^I where ^I is tab/ctrl+i and it becomes popcorn = 3*heat + 89/5

1 Like

I meant the former. Would be nice if there was a function I could loop over a sequence of symbols or strings, but this is already way better than what I was doing:

help?> x̂2
"x̂2" can be typed by x\hat<tab>2

Doesn’t seem to work after pasting multi-character emojis, but we’re doing applied math, not memes:

help?> šŸŸšŸ‘Øā€šŸ’¼IlovethisproductšŸ”
"šŸŸšŸ‘Øā€šŸ’¼IlovethisproductšŸ”" can be typed by \:fries:<tab>\:man:<tab>ā€\:briefcase:<tab>Ilovethisproduct\:hamburger:<tab>

julia> šŸŸšŸ‘ØšŸ’¼IlovethisproductšŸ”

if you want a function, I just grepped the julia codebase and found this in REPL:

julia> REPL.repl_latex(stdout, "Ģ‚")
"Ģ‚" can be typed by \hat<tab>

interesting, I didn’t know we had emoji latex strings…
if we’re supporting emoji then adding a shortcut for the zero width joiner seems like a good move, \zwj doesn’t seem to be taken yet

opened a pr to add \zwj: adds zero-width-joiner input method by rokke-git Ā· Pull Request #61391 Ā· JuliaLang/julia Ā· GitHub

2 Likes