I am wondering if julia will ever automatically fuse broadcast calls. Take the following example:
Example:
f1(x,y,z) = x .= (x .- y)/z
f2(x,y,z) = x .= (x .- y)/.z
x = randn(1000); y = randn(1000); z = 2.0
julia> @code_lowered f1(x,y,z)
CodeInfo(
1 1 ─ %1 = Base.Broadcast.materialize! │
│ %2 = Base.Broadcast.broadcasted │
│ %3 = Base.Broadcast.materialize │
│ %4 = Base.Broadcast.broadcasted │
│ %5 = (%4)(Main.:-, x, y) │
│ %6 = (%3)(%5) │
│ %7 = %6 / z │
│ %8 = (%2)(Base.identity, %7) │
│ %9 = (%1)(x, %8) │
└── return %9 │
)
julia> @code_lowered f2(x,y,z)
CodeInfo(
1 1 ─ %1 = Base.Broadcast.materialize! │
│ %2 = Base.Broadcast.broadcasted │
│ %3 = Base.Broadcast.broadcasted │
│ %4 = (%3)(Main.:-, x, y) │
│ %5 = (%2)(Main.:/, %4, z) │
│ %6 = (%1)(x, %5) │
└── return %6 │
)
Even though AbstractArray / Number
is defined as broadcast
of /
it is lowered to two materialize
, with the additional allocation and time:
julia> @btime f1($x,$y,$z);
2.286 μs (2 allocations: 15.88 KiB)
julia> @btime f2($x,$y,$z);
1.129 μs (0 allocations: 0 bytes)
Is there a good reason for this?
Run on julia 1.0 with -O3