I’ve been thinking about the same thing recently.
If you know that each alg has an in-place implementation, you might want to preallocate your result before the loop.
using BenchmarkTools
using LinearAlgebra
# Original
A(x) = 2x
B(x) = 3x
C(x) = 4x
function f(algorithms, x)
for alg in algorithms
x = alg(x)
end
return x
end
println("Original:")
@btime f($(A, B, C, A, B), $[1.5, 3.0])
# In-place
A!(x) = lmul!(2,x)
B!(x) = lmul!(3,x)
C!(x) = lmul!(4,x)
function finplace(algorithms, x)
# Allocates once
xwork = copy(x)
for alg in algorithms
alg(xwork)
end
return xwork
end
println("In-place:")
@btime finplace($(A!, B!, C!, A!, B!), $[1.5, 3.0])
This prints out
Original:
373.892 ns (5 allocations: 480 bytes)
In-place:
193.564 ns (1 allocation: 96 bytes)
Perhaps you have a MWE where this assumption fails?