Unnecessary materialize in broadcast?

mfalt · August 27, 2018, 3:14pm

I am wondering if julia will ever automatically fuse broadcast calls. Take the following example:
Example:

f1(x,y,z) = x .= (x .- y)/z
f2(x,y,z) = x .= (x .- y)/.z
x = randn(1000); y = randn(1000); z = 2.0

julia> @code_lowered f1(x,y,z)
CodeInfo(
1 1 ─ %1 = Base.Broadcast.materialize!                                                                            │
  │   %2 = Base.Broadcast.broadcasted                                                                             │
  │   %3 = Base.Broadcast.materialize                                                                             │
  │   %4 = Base.Broadcast.broadcasted                                                                             │
  │   %5 = (%4)(Main.:-, x, y)                                                                                    │
  │   %6 = (%3)(%5)                                                                                               │
  │   %7 = %6 / z                                                                                                 │
  │   %8 = (%2)(Base.identity, %7)                                                                                │
  │   %9 = (%1)(x, %8)                                                                                            │
  └──      return %9                                                                                              │
)

julia> @code_lowered f2(x,y,z)
CodeInfo(
1 1 ─ %1 = Base.Broadcast.materialize!                                                                            │
  │   %2 = Base.Broadcast.broadcasted                                                                             │
  │   %3 = Base.Broadcast.broadcasted                                                                             │
  │   %4 = (%3)(Main.:-, x, y)                                                                                    │
  │   %5 = (%2)(Main.:/, %4, z)                                                                                   │
  │   %6 = (%1)(x, %5)                                                                                            │
  └──      return %6                                                                                              │
)

Even though AbstractArray / Number is defined as broadcast of / it is lowered to two materialize, with the additional allocation and time:

julia> @btime f1($x,$y,$z);
  2.286 μs (2 allocations: 15.88 KiB)

julia> @btime f2($x,$y,$z);
  1.129 μs (0 allocations: 0 bytes)

Is there a good reason for this?

Run on julia 1.0 with -O3

jekbradbury · August 28, 2018, 12:53am

/ on arrays is not elementwise division; it’s the left-inverse of * (usually calling a matrix solve) and returns the same result as ./ only in the case where the second operand is a scalar. In the case of + and - (which do work elementwise on arrays), they differ from the broadcasted .+ and .- because they require that the arrays are the same shape (i.e., they don’t actually broadcast). In either case silently changing them to their dotted equivalent to enable fusion would change behavior.

mbauman · August 28, 2018, 3:36am

Loop fusion is a hard optimization to implement in general — you have to prove that a given method operates elementwise, ensure that there are no other side-effects, and make sure that no references to the intermediate calculation are referenced or escape. We build up a lazy representation — and then you need to know when to finally materialize it. The dot-broadcasting syntax implements a very syntax “sugar” transform that you can rely upon (and arrays can customize): all dotted operators and functions fuse with adjacent dots. Any non-dotted operator or function will immediately break the chain. It’s a very simple rule, but it’s served us extraordinarily well.

I highly recommend the more dots blog post, which addresses this question and much more.

mfalt · August 28, 2018, 11:41am

@jekbradbury The point here is that we actually have
(/)(A::AbstractArray, B::Number) = broadcast(/, A, B)

After reading the interesting link by @mbauman, I guess the problem is that the types are not known at lowering, so the optimization can not be done there (even if it was legal). However after type inference, I don’t understand why julia can’t rewrite from
materialize!(x, broadcasted(/, materialize(broadcasted(-, x, y)), z))
to
materialize!(x, broadcasted(/, broadcasted(-, x, y), z)).
My understanding is that julia is doing some type of optimizations at this step, for example constant propagation, this could be another one.

I understand why it would be hard for the llvm compiler in the next step to do the optimization.

stevengj · August 28, 2018, 1:28pm

See More Dots: Syntactic Loop Fusion in Julia

mfalt · August 28, 2018, 3:59pm

Yes, that was the link that mbauman posted and that I read, and I acknowledged that it is a hard problem in general. The question is regarding the case where julia has already decided that broadcasting is the appropriate operation, as in the case above, the call already gets changed internally to
materialize!(x, broadcasted(/, materialize(broadcasted(-, x, y)), z))
so at this time I don’t see the problem.
I don’t clam to understand all the internals, maybe in the general case, the following are not identical in behavior?
materialize!(..., broadcasted(..., materialize(broadcasted(..., ..., ...)), ...))
materialize!(..., broadcasted(..., broadcasted(..., ..., ...), ...))
Is that the issue?

jebej · August 28, 2018, 4:11pm

Broadcasting is not the same as fusion. An array divided by a number will have the number “broadcast” over the array. Fusion is the broadcast of multiple operations, and to know which operations to fuse, you need the dot (at least for now).

jebej · August 28, 2018, 4:16pm

I think you already understood what I just said above, sorry about that. So to answer

Yes. This is not a transformation you can do in general, and so for now requires dots.

mbauman · August 28, 2018, 6:56pm

Also there may be cases where doing the fused operation is slower than keeping them separate. It’s not always a slam dunk.

mfalt · August 28, 2018, 8:16pm

I see, thanks for helping me answer this.

Topic		Replies	Views
Newbie question about loop fusion and broadcasting New to Julia	9	1018	June 12, 2020
Trace of broadcasting General Usage	4	364	February 27, 2020
Reducing Materialize in Profiling Performance	5	415	February 24, 2024
Preventing broadcast fusing General Usage broadcast	17	1269	February 3, 2019
Sum, mapreduce and broadcasted Performance broadcast	14	2935	September 23, 2018

Unnecessary materialize in broadcast?

Related topics