Reducing allocations when subtracting vectors and multiplying by scalars

Is it possible to re-write the following function foo(x, y), such that it only uses one allocation?

using BenchmarkTools 

function foo(x::Vector{Float64}, y::Vector{Float64})
    (x .- y) .* 3.0
end

This currently yields 3 allocations:

julia> @btime foo([1.0, 1.0, 1.0], [2.0, 2.0, 2.0])
57.477 ns (3 allocations: 336 bytes)
3-element Vector{Float64}:
 -3.0
 -3.0
 -3.0

Is it possible to get this down to one allocation with 112 bytes, as one would get with allocating an array of length 3? For instance:

function give_me_array_please()
    [0.0, 0.0, 0.0]
end

yields:

julia> @btime give_me_array_please()
16.934 ns (1 allocation: 112 bytes)
3-element Vector{Float64}:
 0.0
 0.0
 0.0
1 Like

That is a benchmarking artifact, you are benchmarking the creation of the arrays. If you interpolate them, it has only one allocation:

julia> @btime foo($([1.0, 1.0, 1.0]), $([2.0, 2.0, 2.0]))
  32.036 ns (1 allocation: 112 bytes)
3-element Vector{Float64}:
 -3.0
 -3.0
 -3.0

If you use static arrays, you can go to zero allocations:

julia> using StaticArrays

julia> function foo(x::AbstractVector{Float64}, y::AbstractVector{Float64})
           (x .- y) .* 3.0
       end
foo (generic function with 2 methods)

julia> @btime foo($(SVector{3,Float64}(1.0, 1.0, 1.0)), $(SVector{3,Float64}(2.0, 2.0, 2.0)))
  0.015 ns (0 allocations: 0 bytes)
3-element SVector{3, Float64} with indices SOneTo(3):
 -3.0
 -3.0
 -3.0


(which can be very useful if your problem deals with small arrays, as in the example).

1 Like

Yeah, 3 allocations is exactly right, because you are creating three arrays. It’s not just an artefact.

This is a benchmarking artefact, though.

the time is, the 0 allocations allocations aren’t, just to be clear

ok, it is an artifact in the sense that the OP was not benchmarking only what was going inside the function, I thought that was clear.

1 Like

How could I re-write this such that I’m not creating three arrays?

OK, I tend to think of ‘benchmarking artefacts’ as those cases where the benchmarking itself is ‘off’, due to for example accessing global variables, i.e. if @btime reported 7 allocations instead of 3. The ordinary @time macro is plagued by this.

You can follow @lmiq’s suggestion:

or you can create the input arrays separately:

x = [1.0, 1.0, 1.0];
y = [2.0, 2.0, 2.0];
@btime foo($x, $y)

In both cases only the function call on the already existing arrays is measured.

julia> @time foo([1.0, 1.0, 1.0], [2.0, 2.0, 2.0])
  0.000001 seconds (3 allocations: 240 bytes)

though

Yes, benchmarking seems to have become a lot better recently. It also seems less necessary to use interpolation with @btime.

1 Like

Being a bit more explicit. When you write something like [1.0, 1.0, 1.0] you are creating a new array. Thus, when you did

@btime foo([1.0, 1.0, 1.0], [2.0, 2.0, 2.0])

you benchmarked not only what was going inside func, but the creation of the two arrays that you are providing as input. To avoid that you have to create the input arrays outside the @btime call:

julia> foo(x, y) = (x .- y) .* 3.0
foo (generic function with 1 method)

julia> x = [1.0,1.0,1.0]; y = [2.0,2.0,2.0];

julia> @time foo(x,y)
  0.000005 seconds (1 allocation: 112 bytes)
3-element Vector{Float64}:
 -3.0
 -3.0
 -3.0

julia> @btime foo(x,y)
  36.320 ns (1 allocation: 112 bytes)
3-element Vector{Float64}:
 -3.0
 -3.0
 -3.0

Additionally, the proper way to use the macros from BenchmarkTools is by interpolating the variables (adding the $), thus although the one @btime foo(x,y) above reported the correct number of allocations, it is not the correct and recommended way to use the macro. That would be what DNF suggested:

julia> @btime foo($x,$y)
  31.983 ns (1 allocation: 112 bytes)
3-element Vector{Float64}:
 -3.0
 -3.0
 -3.0

Why you have interpolate the variables is associated to the internals of BenchmarkTools and, to be sincere, I never completely understood it, something related to the parsing of the values and not references to variables by the macro.

3 Likes

Ahhh of course, makes sense. Thanks!

1 Like