tomtom
April 19, 2025, 1:41pm
1
given
x = randn(5)
y = randn(5)
i = [1, 3]
is there any speed difference between
y[i] .= x[i] .+ 1.0
and
y[i] .= view(x, i) .+ 1.0
?
to be more specific, would x[i]
in the RHS of a fused loop create a copy of vector before the loop start?
also, what macro could we use to inspect the lowered code?
Thanks
oheil
April 19, 2025, 1:54pm
2
Yes
Thatβs how I do it:
julia> function f1()
x = randn(5)
y = randn(5)
i = [1, 3]
y[i] .= x[i] .+ 1.0
y
end
julia> function f2()
x = randn(5)
y = randn(5)
i = [1, 3]
y[i] .= view(x, i) .+ 1.0
y
end
julia> @code_lowered f1()
CodeInfo(
1 β x = Main.randn(5)
β y = Main.randn(5)
β i = Base.vect(1, 3)
β %4 = y
β %5 = i
β %6 = Base.dotview(%4, %5)
β %7 = Main.:+
β %8 = x
β %9 = i
β %10 = Base.getindex(%8, %9)
β %11 = Base.broadcasted(%7, %10, 1.0)
β Base.materialize!(%6, %11)
β %13 = y
βββ return %13
)
julia> @code_lowered f2()
CodeInfo(
1 β x = Main.randn(5)
β y = Main.randn(5)
β i = Base.vect(1, 3)
β %4 = y
β %5 = i
β %6 = Base.dotview(%4, %5)
β %7 = Main.:+
β %8 = x
β %9 = i
β %10 = Main.view(%8, %9)
β %11 = Base.broadcasted(%7, %10, 1.0)
β Base.materialize!(%6, %11)
β %13 = y
βββ return %13
)
Both function are identical except for:
%10 = Base.getindex(%8, %9)
%10 = Main.view(%8, %9)
where you see the copy/allocation.
You can see it with benchmarking too:
julia> using BenchmarkTools
julia> @benchmark f1()
...
Memory estimate: 352 bytes, allocs estimate: 8.
VS
julia> @benchmark f2()
...
Memory estimate: 272 bytes, allocs estimate: 6.
6 Likes