Return value optimization and named return value optimization

Does Julia have these optimizations available like in C++? I.e., if I preallocate a value in the caller to store the return from the callee, will the callee write to that memory, or will it be allocated and then copied? From what I understand these optimizations are not possible in Julia.

Julia uses pass by sharing as opposed to pass by value, so passing and returning values doesn’t create copies in the first place.

1 Like

I think in theory Julia will allocate more memory than C++ in this case, not less.

However:

seems C++ still allocates before each memset in the loop here.

The call to memset is dominated by operator new.

Perhaps there is a better test?

With a stack allocated & size known at compile time array, C++ does well, but Julia would do well here too (other than it taking more work to manage lifetimes/pass it to non-inlined functions):

If I’m understanding your question correctly, you can do the same thing in Julia:

callee!(x::Ref{Int}) = x[] = 1
function caller()
    x = Ref{Int}()
    callee!(x)
    println(x[])
end

But you shouldn’t do this generally. Julia is quite good about not allocating at all for the return value, effectively allowing values to move into registers and making it faster than pre-allocating storage for the return value:

julia> using BenchmarkTools

julia> @noinline callee!(x::Ref{Int}) = x[] = 1
callee! (generic function with 1 method)

julia> @noinline fastcallee(::Any) = 1
fastcallee (generic function with 1 method)

julia> x = Ref{Int}();

julia> @btime callee!($x)
  3.141 ns (0 allocations: 0 bytes)
1

julia> @btime fastcallee($x)
  1.264 ns (0 allocations: 0 bytes)
1
2 Likes

But I would think preallocation would be faster for larger arrays. I can see how it holds for scalars. Thanks for the response. In particular, for the use case where you will call the function many times.

Yes, for arrays you often want to pre-allocate the storage. Julia’s abundant functions ending in ! (“warning: mutates one or more of the inputs”) are a hint that this is well-supported. And for small arrays (e.g., 2x2), you may find that even there it’s better to return a StaticArray (from the StaticArrays.jl package).

2 Likes

We don’t want to preallocate StaticArrays because they are stack allocated, I presume. So there would be no speed up to do so. Is this correct?

Right, you should only consider pre-allocation for heap-allocated values.