Does Julia have these optimizations available like in C++? I.e., if I preallocate a value in the caller to store the return from the callee, will the callee write to that memory, or will it be allocated and then copied? From what I understand these optimizations are not possible in Julia.
Julia uses pass by sharing as opposed to pass by value, so passing and returning values doesn’t create copies in the first place.
I think in theory Julia will allocate more memory than C++ in this case, not less.
However:
seems C++ still allocates before each memset in the loop here.
The call to memset
is dominated by operator new
.
Perhaps there is a better test?
With a stack allocated & size known at compile time array, C++ does well, but Julia would do well here too (other than it taking more work to manage lifetimes/pass it to non-inlined functions):
If I’m understanding your question correctly, you can do the same thing in Julia:
callee!(x::Ref{Int}) = x[] = 1
function caller()
x = Ref{Int}()
callee!(x)
println(x[])
end
But you shouldn’t do this generally. Julia is quite good about not allocating at all for the return value, effectively allowing values to move into registers and making it faster than pre-allocating storage for the return value:
julia> using BenchmarkTools
julia> @noinline callee!(x::Ref{Int}) = x[] = 1
callee! (generic function with 1 method)
julia> @noinline fastcallee(::Any) = 1
fastcallee (generic function with 1 method)
julia> x = Ref{Int}();
julia> @btime callee!($x)
3.141 ns (0 allocations: 0 bytes)
1
julia> @btime fastcallee($x)
1.264 ns (0 allocations: 0 bytes)
1
But I would think preallocation would be faster for larger arrays. I can see how it holds for scalars. Thanks for the response. In particular, for the use case where you will call the function many times.
Yes, for arrays you often want to pre-allocate the storage. Julia’s abundant functions ending in !
(“warning: mutates one or more of the inputs”) are a hint that this is well-supported. And for small arrays (e.g., 2x2), you may find that even there it’s better to return a StaticArray (from the StaticArrays.jl package).
We don’t want to preallocate StaticArrays because they are stack allocated, I presume. So there would be no speed up to do so. Is this correct?
Right, you should only consider pre-allocation for heap-allocated values.