Unnecessary materialize in broadcast?

I am wondering if julia will ever automatically fuse broadcast calls. Take the following example:
Example:

f1(x,y,z) = x .= (x .- y)/z
f2(x,y,z) = x .= (x .- y)/.z
x = randn(1000); y = randn(1000); z = 2.0
julia> @code_lowered f1(x,y,z)
CodeInfo(
1 1 ─ %1 = Base.Broadcast.materialize!                                                                            │
  │   %2 = Base.Broadcast.broadcasted                                                                             │
  │   %3 = Base.Broadcast.materialize                                                                             │
  │   %4 = Base.Broadcast.broadcasted                                                                             │
  │   %5 = (%4)(Main.:-, x, y)                                                                                    │
  │   %6 = (%3)(%5)                                                                                               │
  │   %7 = %6 / z                                                                                                 │
  │   %8 = (%2)(Base.identity, %7)                                                                                │
  │   %9 = (%1)(x, %8)                                                                                            │
  └──      return %9                                                                                              │
)
julia> @code_lowered f2(x,y,z)
CodeInfo(
1 1 ─ %1 = Base.Broadcast.materialize!                                                                            │
  │   %2 = Base.Broadcast.broadcasted                                                                             │
  │   %3 = Base.Broadcast.broadcasted                                                                             │
  │   %4 = (%3)(Main.:-, x, y)                                                                                    │
  │   %5 = (%2)(Main.:/, %4, z)                                                                                   │
  │   %6 = (%1)(x, %5)                                                                                            │
  └──      return %6                                                                                              │
)

Even though AbstractArray / Number is defined as broadcast of / it is lowered to two materialize, with the additional allocation and time:

julia> @btime f1($x,$y,$z);
  2.286 μs (2 allocations: 15.88 KiB)

julia> @btime f2($x,$y,$z);
  1.129 μs (0 allocations: 0 bytes)

Is there a good reason for this?

Run on julia 1.0 with -O3

/ on arrays is not elementwise division; it’s the left-inverse of * (usually calling a matrix solve) and returns the same result as ./ only in the case where the second operand is a scalar. In the case of + and - (which do work elementwise on arrays), they differ from the broadcasted .+ and .- because they require that the arrays are the same shape (i.e., they don’t actually broadcast). In either case silently changing them to their dotted equivalent to enable fusion would change behavior.

Loop fusion is a hard optimization to implement in general — you have to prove that a given method operates elementwise, ensure that there are no other side-effects, and make sure that no references to the intermediate calculation are referenced or escape. We build up a lazy representation — and then you need to know when to finally materialize it. The dot-broadcasting syntax implements a very syntax “sugar” transform that you can rely upon (and arrays can customize): all dotted operators and functions fuse with adjacent dots. Any non-dotted operator or function will immediately break the chain. It’s a very simple rule, but it’s served us extraordinarily well.

I highly recommend the more dots blog post, which addresses this question and much more.

4 Likes

@jekbradbury The point here is that we actually have
(/)(A::AbstractArray, B::Number) = broadcast(/, A, B)

After reading the interesting link by @mbauman, I guess the problem is that the types are not known at lowering, so the optimization can not be done there (even if it was legal). However after type inference, I don’t understand why julia can’t rewrite from
materialize!(x, broadcasted(/, materialize(broadcasted(-, x, y)), z))
to
materialize!(x, broadcasted(/, broadcasted(-, x, y), z)).
My understanding is that julia is doing some type of optimizations at this step, for example constant propagation, this could be another one.

I understand why it would be hard for the llvm compiler in the next step to do the optimization.

See More Dots: Syntactic Loop Fusion in Julia

Yes, that was the link that mbauman posted and that I read, and I acknowledged that it is a hard problem in general. The question is regarding the case where julia has already decided that broadcasting is the appropriate operation, as in the case above, the call already gets changed internally to
materialize!(x, broadcasted(/, materialize(broadcasted(-, x, y)), z))
so at this time I don’t see the problem.
I don’t clam to understand all the internals, maybe in the general case, the following are not identical in behavior?
materialize!(..., broadcasted(..., materialize(broadcasted(..., ..., ...)), ...))
materialize!(..., broadcasted(..., broadcasted(..., ..., ...), ...))
Is that the issue?

Broadcasting is not the same as fusion. An array divided by a number will have the number “broadcast” over the array. Fusion is the broadcast of multiple operations, and to know which operations to fuse, you need the dot (at least for now).

I think you already understood what I just said above, sorry about that. So to answer

Yes. This is not a transformation you can do in general, and so for now requires dots.

1 Like

Also there may be cases where doing the fused operation is slower than keeping them separate. It’s not always a slam dunk.

I see, thanks for helping me answer this.