How to inspect fused loop?

given

x = randn(5)
y = randn(5)
i = [1, 3]

is there any speed difference between

y[i] .= x[i] .+ 1.0

and

y[i] .= view(x, i) .+ 1.0

?

to be more specific, would x[i] in the RHS of a fused loop create a copy of vector before the loop start?
also, what macro could we use to inspect the lowered code?

Thanks

Yes

That’s how I do it:

julia> function f1()
       x = randn(5)
       y = randn(5)
       i = [1, 3]
       y[i] .= x[i] .+ 1.0
       y
       end
julia> function f2()
       x = randn(5)
       y = randn(5)
       i = [1, 3]
       y[i] .= view(x, i) .+ 1.0
       y
       end
julia> @code_lowered f1()
CodeInfo(
1 ─       x = Main.randn(5)
β”‚         y = Main.randn(5)
β”‚         i = Base.vect(1, 3)
β”‚   %4  = y
β”‚   %5  = i
β”‚   %6  = Base.dotview(%4, %5)
β”‚   %7  = Main.:+
β”‚   %8  = x
β”‚   %9  = i
β”‚   %10 = Base.getindex(%8, %9)
β”‚   %11 = Base.broadcasted(%7, %10, 1.0)
β”‚         Base.materialize!(%6, %11)
β”‚   %13 = y
└──       return %13
)
julia> @code_lowered f2()
CodeInfo(
1 ─       x = Main.randn(5)
β”‚         y = Main.randn(5)
β”‚         i = Base.vect(1, 3)
β”‚   %4  = y
β”‚   %5  = i
β”‚   %6  = Base.dotview(%4, %5)
β”‚   %7  = Main.:+
β”‚   %8  = x
β”‚   %9  = i
β”‚   %10 = Main.view(%8, %9)
β”‚   %11 = Base.broadcasted(%7, %10, 1.0)
β”‚         Base.materialize!(%6, %11)
β”‚   %13 = y
└──       return %13
)

Both function are identical except for:

%10 = Base.getindex(%8, %9)
%10 = Main.view(%8, %9)

where you see the copy/allocation.

You can see it with benchmarking too:

julia> using BenchmarkTools
julia> @benchmark f1()
...
Memory estimate: 352 bytes, allocs estimate: 8.

VS

julia> @benchmark f2()
...
Memory estimate: 272 bytes, allocs estimate: 6.
6 Likes