Yes, there are a whole slew of new features here, and they aren’t totally orthogonal. I think it’s helpful to think of this in stages. Assume in all the code samples below that I’m using .Broadcast: broadcasted, Broadcasted
.
Construction of the broadcast expression tree
The parser transforms an expression like (A .+ B) ./ C
into broadcasted(/, broadcasted(+, A, B), C)
. By default, this constructs Broadcasted
objects that just hold onto the function and tuple of arguments, but it’s using a lowercase function (instead of the constructor directly) to allow you to do something different. This means that, as dfdx suggested, you can simply overload broadcasted(::typeof(+), ::YourType, ::YourType)
to either return an alternative array-like lazy representation or do the work immediately and return an intermediate array.
Sometimes, like in the case Mike mentions, you also have a BroadcastStyle
promotion system setup and want to dispatch on the combined style of the arguments — often to catch that “at least one argument is a TrackedArray
” dispatch problem. You can instead overload broadcasted(::YourStyle, ::typeof(+), ...)
. Note, though that this is just the style of the arguments passed to +
, not the overall style of the entire fused expression. This is how ranges can now return ranges from broadcasting: they opt-out of fusion to compute things in O(1) when possible: base/broadcast.jl#L961-L1013.
Note that one of the arguments you get in such a broadcasted
implementation could be a lazy Broadcasted
expression, too! For example if C
was a TrackedArray
and A
and B
weren’t, you’d end up with a division between a Broadcasted(/, A, B)
and your C
. You’ll have to decide if you want to manually materialize that nested Broadcasted
into a temporary before executing the division by C
or if you’re able to fuse it with the division.
Execution of a broadcasted expression
If you don’t override broadcasted
, Julia will create a Broadcasted
representation of the expression. It computes the overall broadcast style of the entire expression by walking through all the nested Broadcasted
nodes and combining them, and then it stores this as the first type parameter of the Broadcasted
object. This is then copy
’ed or copyto!
'ed (in the case of .=
), allowing you to customize copy(bc::Broadcasted{YourStyle})
or copyto!(::AbstractArray, bc::Broadcasted{YourStyle})
. In these function bodies, you have the entire Broadcasted
expression tree available for introspection — you can walk through the bc.args
and manually decide how you want to execute the functions. Of course, that can be a lot of work and hard to get right, so you may just want to inspect the broadcast tree to see if it’s an “easy” case (like just a map
-equivalent without broadcast expansion) and defer to a simpler optimized implementation. Or you can walk through the passed bc
object and transform it into an equivalent Broadcasted
representation if it’s at all possible as that will allow you to use its simple outer API:
for I in eachindex(bc)
result[I] = bc[I]
end
This is how BitArray hooks into the broadcast system — it uses both aspects I mention here: it reports itself as a DefaultArrayStyle
, and then in the DefaultArrayStyle
implementation, we first walk through the Broadcasted
expression tree to see if it’s a case where we can perform a “chunked” broadcast — that is, if we can use an implementation that operates on the UInt64
chunks instead of the individual bits (base/broadcast.jl#L841). The implementation then also does a transformation of the passed bc
object to convert compatible bit-wise functions into their “chunkable” equivalents (base/broadcast.jl#L880-L886). For example, we transform the !
function (which only negates bools) to the bitwise ~
function to allow inversions to operate at the level of chunks.
So that’s a lot of detail, but I hope that helps you figure out where is best for you to latch into the system. The simple answer is broadcasted
, but there may be cases where you’d want to consider the entire expression tree as a whole before doing your fusion opt-out.