Reducing allocations when subtracting vectors and multiplying by scalars

jewh · November 9, 2021, 12:03pm

Is it possible to re-write the following function foo(x, y), such that it only uses one allocation?

using BenchmarkTools 

function foo(x::Vector{Float64}, y::Vector{Float64})
    (x .- y) .* 3.0
end

This currently yields 3 allocations:

julia> @btime foo([1.0, 1.0, 1.0], [2.0, 2.0, 2.0])
57.477 ns (3 allocations: 336 bytes)
3-element Vector{Float64}:
 -3.0
 -3.0
 -3.0

Is it possible to get this down to one allocation with 112 bytes, as one would get with allocating an array of length 3? For instance:

function give_me_array_please()
    [0.0, 0.0, 0.0]
end

yields:

julia> @btime give_me_array_please()
16.934 ns (1 allocation: 112 bytes)
3-element Vector{Float64}:
 0.0
 0.0
 0.0

lmiq · November 9, 2021, 12:25pm

That is a benchmarking artifact, you are benchmarking the creation of the arrays. If you interpolate them, it has only one allocation:

julia> @btime foo($([1.0, 1.0, 1.0]), $([2.0, 2.0, 2.0]))
  32.036 ns (1 allocation: 112 bytes)
3-element Vector{Float64}:
 -3.0
 -3.0
 -3.0

If you use static arrays, you can go to zero allocations:

julia> using StaticArrays

julia> function foo(x::AbstractVector{Float64}, y::AbstractVector{Float64})
           (x .- y) .* 3.0
       end
foo (generic function with 2 methods)

julia> @btime foo($(SVector{3,Float64}(1.0, 1.0, 1.0)), $(SVector{3,Float64}(2.0, 2.0, 2.0)))
  0.015 ns (0 allocations: 0 bytes)
3-element SVector{3, Float64} with indices SOneTo(3):
 -3.0
 -3.0
 -3.0

(which can be very useful if your problem deals with small arrays, as in the example).

DNF · November 9, 2021, 12:28pm

Yeah, 3 allocations is exactly right, because you are creating three arrays. It’s not just an artefact.

This is a benchmarking artefact, though.

lmiq · November 9, 2021, 12:30pm

the time is, the 0 allocations allocations aren’t, just to be clear

ok, it is an artifact in the sense that the OP was not benchmarking only what was going inside the function, I thought that was clear.

jewh · November 9, 2021, 12:41pm

How could I re-write this such that I’m not creating three arrays?

DNF · November 9, 2021, 12:45pm

OK, I tend to think of ‘benchmarking artefacts’ as those cases where the benchmarking itself is ‘off’, due to for example accessing global variables, i.e. if @btime reported 7 allocations instead of 3. The ordinary @time macro is plagued by this.

You can follow @lmiq’s suggestion:

or you can create the input arrays separately:

x = [1.0, 1.0, 1.0];
y = [2.0, 2.0, 2.0];
@btime foo($x, $y)

In both cases only the function call on the already existing arrays is measured.

kristoffer.carlsson · November 9, 2021, 12:57pm

julia> @time foo([1.0, 1.0, 1.0], [2.0, 2.0, 2.0])
  0.000001 seconds (3 allocations: 240 bytes)

though

DNF · November 9, 2021, 1:01pm

Yes, benchmarking seems to have become a lot better recently. It also seems less necessary to use interpolation with @btime.

lmiq · November 9, 2021, 1:08pm

Being a bit more explicit. When you write something like [1.0, 1.0, 1.0] you are creating a new array. Thus, when you did

@btime foo([1.0, 1.0, 1.0], [2.0, 2.0, 2.0])

you benchmarked not only what was going inside func, but the creation of the two arrays that you are providing as input. To avoid that you have to create the input arrays outside the @btime call:

julia> foo(x, y) = (x .- y) .* 3.0
foo (generic function with 1 method)

julia> x = [1.0,1.0,1.0]; y = [2.0,2.0,2.0];

julia> @time foo(x,y)
  0.000005 seconds (1 allocation: 112 bytes)
3-element Vector{Float64}:
 -3.0
 -3.0
 -3.0

julia> @btime foo(x,y)
  36.320 ns (1 allocation: 112 bytes)
3-element Vector{Float64}:
 -3.0
 -3.0
 -3.0

Additionally, the proper way to use the macros from BenchmarkTools is by interpolating the variables (adding the $), thus although the one @btime foo(x,y) above reported the correct number of allocations, it is not the correct and recommended way to use the macro. That would be what DNF suggested:

julia> @btime foo($x,$y)
  31.983 ns (1 allocation: 112 bytes)
3-element Vector{Float64}:
 -3.0
 -3.0
 -3.0

Why you have interpolate the variables is associated to the internals of BenchmarkTools and, to be sincere, I never completely understood it, something related to the parsing of the values and not references to variables by the macro.

jewh · November 9, 2021, 1:32pm

Ahhh of course, makes sense. Thanks!

Topic		Replies	Views
Number of allocations New to Julia	1	625	February 7, 2020
Common allocation mistakes Performance memory-allocation	47	7161	August 21, 2023
Filling pre-allocated arrays with minimal function allocations New to Julia	3	132	September 23, 2024
Where does this allocation come from and how to avoid it? StaticArrays and structures Performance question , function , struct , staticarrays , allocations	3	385	July 19, 2022
Reduce the allocation to zero General Usage question	7	458	April 7, 2022

Reducing allocations when subtracting vectors and multiplying by scalars

Related topics