How to inspect fused loop?

tomtom · April 19, 2025, 1:41pm

given

x = randn(5)
y = randn(5)
i = [1, 3]

is there any speed difference between

y[i] .= x[i] .+ 1.0

and

y[i] .= view(x, i) .+ 1.0

?

to be more specific, would x[i] in the RHS of a fused loop create a copy of vector before the loop start?
also, what macro could we use to inspect the lowered code?

Thanks

oheil · April 19, 2025, 1:54pm

Yes

That’s how I do it:

julia> function f1()
       x = randn(5)
       y = randn(5)
       i = [1, 3]
       y[i] .= x[i] .+ 1.0
       y
       end

julia> function f2()
       x = randn(5)
       y = randn(5)
       i = [1, 3]
       y[i] .= view(x, i) .+ 1.0
       y
       end

julia> @code_lowered f1()
CodeInfo(
1 ─       x = Main.randn(5)
│         y = Main.randn(5)
│         i = Base.vect(1, 3)
│   %4  = y
│   %5  = i
│   %6  = Base.dotview(%4, %5)
│   %7  = Main.:+
│   %8  = x
│   %9  = i
│   %10 = Base.getindex(%8, %9)
│   %11 = Base.broadcasted(%7, %10, 1.0)
│         Base.materialize!(%6, %11)
│   %13 = y
└──       return %13
)

julia> @code_lowered f2()
CodeInfo(
1 ─       x = Main.randn(5)
│         y = Main.randn(5)
│         i = Base.vect(1, 3)
│   %4  = y
│   %5  = i
│   %6  = Base.dotview(%4, %5)
│   %7  = Main.:+
│   %8  = x
│   %9  = i
│   %10 = Main.view(%8, %9)
│   %11 = Base.broadcasted(%7, %10, 1.0)
│         Base.materialize!(%6, %11)
│   %13 = y
└──       return %13
)

Both function are identical except for:

%10 = Base.getindex(%8, %9)

%10 = Main.view(%8, %9)

where you see the copy/allocation.

You can see it with benchmarking too:

julia> using BenchmarkTools
julia> @benchmark f1()
...
Memory estimate: 352 bytes, allocs estimate: 8.

VS

julia> @benchmark f2()
...
Memory estimate: 272 bytes, allocs estimate: 6.

Topic		Replies	Views
Performance drop for explicit indexing in fused loops General Usage	10	1338	June 26, 2017
Normal vs broadcasted slice assignment General Usage	5	265	February 16, 2024
Passing sub-array by reference / performance Performance	18	4727	April 10, 2019
Broadcast vs. scalar loop, can Julia vectorize better? Internals & Design	8	1920	February 15, 2020
Loop over multi-dimensional array optimization Performance question	9	1095	August 15, 2019

How to inspect fused loop?

Related topics