Consider the following mutating function that mutates a vector input:
function f_test!(xout::Vector{T}, a::T, b::T) where T <: Real
xout .= a .* b
view(xout, 1:max(1, lastindex(xout)-5)) .-= b
return xout
end
I timed this with julia version 1.9.0-rc3 with the following results:
ntest = 100
const xout = zeros(ntest)
@btime f_test!($xout, 1.0, 2.0)
24.293 ns (0 allocations: 0 bytes)
Inspecting the LLVM code shows compilation of SIMD instructions.
I tried wrapping the function in a constructor that would create a pre-allocated vector and an anonymous function that only takes two arguments and uses the hidden internal state.
function test_allocations(n::Integer)
xout = zeros(n)
(a, b) -> f_test!(xout, a, b)
end
And then I created an instance of this function with the same pre-allocated size as the above test.
f_alloc = test_allocations(ntest)
Timing this led to slower results and I also noticed that the generated code didn’t have any SIMD instructions. Although different versions of the f_test! function did result in SIMD instructions here too.
@btime f_alloc(1.0, 2.0)
36.827 ns (0 allocations: 0 bytes)
@code_llvm f_alloc(1.0, 2.0)
define nonnull {}* @"julia_#1_19394"([1 x {}*]* nocapture noundef nonnull readonly align 8 dereferenceable(8) %0, double %1, double %2) #0 {
top:
%3 = getelementptr inbounds [1 x {}*], [1 x {}*]* %0, i64 0, i64 0
%4 = load atomic {}*, {}** %3 unordered, align 8
%5 = call nonnull {}* @"j_f_test!_19396"({}* nonnull %4, double %1, double %2) #0
ret {}* %5
}
Is there a way to define this constructor so the compiled code is the same? In both cases there are zero allocations, but I gather from reading previous discussions that always passing the arguments explicitly is better. If I did wanna do something like this with having internal states of pre-allocated outputs for functions, is there a better way of doing this than a wrote here?
Looking to understand what is going on with this example and to have general tips on the best way to design this sort of thing. Thanks!