# Is it possible to get allocations from 1 to 0 in this code?

Hi,

I have been trying to improve the performance of my code. The function below had 7 allocations before and some warning when using `@code_warntype`. I made several changes that reduced allocations from 7 to 1 and removed the red ink from `@code_warntype`. Is it possible to make the allocations to zero in this function?

Using TimerOutputs.jl I could not find which line is allocating. Individually, it seems all lines have zero allocations.

``````function construct_tuple(arr1, ::Val{n1}, ::Val{m1}, ::Val{m2}, arr2) where {n1, m1, m2}
len_arr1 = length(arr1)
tup1 = (len_arr1 > n1) ? ntuple(i -> (@inbounds arr1[end-n1+i]), n1) :
ntuple(i -> i <= len_arr1 ? (@inbounds arr1[i]) : :N, n1)

len_arr2 = length(arr2)
tup2 = (len_arr2 > m1) ? ntuple(i -> (@inbounds arr2[end-m1+i]), m1) :
ntuple(i -> i <= len_arr2 ? (@inbounds arr2[i]) : :N, m1)

tup3 = len_arr2 >= m2 ? ntuple(i -> (@inbounds arr2[i]), m2) :
ntuple(i -> i <= len_arr2 ? (@inbounds arr2[i]) : :N, m2)

(tup1, tup2, tup3)
end
``````

Can you show us what input you’re benchmarking on? When I do this, I get way more allocations than you:

``````julia> function construct_tuple(arr1, ::Val{n1}, ::Val{m1}, ::Val{m2}, arr2) where {n1, m1, m2}
len_arr1 = length(arr1)
tup1 = (len_arr1 > n1) ? ntuple(i -> (@inbounds arr1[end-n1+i]), n1) :
ntuple(i -> i <= len_arr1 ? (@inbounds arr1[i]) : :N, n1)

len_arr2 = length(arr2)
tup2 = (len_arr2 > m1) ? ntuple(i -> (@inbounds arr2[end-m1+i]), m1) :
ntuple(i -> i <= len_arr2 ? (@inbounds arr2[i]) : :N, m1)

tup3 = len_arr2 >= m2 ? ntuple(i -> (@inbounds arr2[i]), m2) :
ntuple(i -> i <= len_arr2 ? (@inbounds arr2[i]) : :N, m2)

(tup1, tup2, tup3)
end
construct_tuple (generic function with 1 method)

julia> let arr1 = rand(10), arr2 = rand(8)
@btime construct_tuple(\$arr1, Val(5), Val(9), Val(11), \$arr2)
end
619.801 ns (22 allocations: 976 bytes)
((0.4254217904105211, 0.13241443472571168, 0.869700471092793, 0.7124209560321589, 0.5954160420851926), (0.737801411380624, 0.556271836516959, 0.021540151975602106, 0.7126850002734751, 0.31805793324648257, 0.4001497672303781, 0.6672644424251242, 0.707907493489921, :N), (0.737801411380624, 0.556271836516959, 0.021540151975602106, 0.7126850002734751, 0.31805793324648257, 0.4001497672303781, 0.6672644424251242, 0.707907493489921, :N, :N, :N))
``````

Hey Mason,

when I try

``````function test1()
sv1 = SVector{3}(rand(3))
sv2 = SVector{3}(rand(3))
ans = construct_tuple(sv1, Val(1), Val(2), Val(3), sv2)
end

@btime test1()
``````

I get 2 allocs

``````  88.030 ns (2 allocations: 224 bytes)
((0.9505273873529649,), (0.7657941423837595, 0.2197983373943313), (0.38703121887758996, 0.7657941423837595, 0.2197983373943313))
``````

and they seem related to rand(). Is it possible to randomize an SVector?

1 Like

Sorry, I forgot to include arr1 and arr2.

``````arr1 = [rand((:A, :B)) for i = 1:100]
arr2 = [rand((:A, :B)) for i = 1:100]

construct_tuple(arr1, Val(2), Val(2), Val(2), arr2)
``````

From the repl:

``````julia> arr1 = [rand((:A, :B)) for i = 1:100];

julia> arr2 = [rand((:A, :B)) for i = 1:100];

julia> @btime construct_tuple(arr1, Val(2), Val(2), Val(2), arr2)
26.214 ns (1 allocation: 64 bytes)
((:A, :B), (:B, :A), (:A, :B))
``````

If I do this

``````function test1()
arr1 = [rand((:A, :B)) for i = 1:100]
arr2 = [rand((:A, :B)) for i = 1:100]
construct_tuple(arr1, Val(2), Val(2), Val(2), arr2)
end

Profile.clear()
test1()
@profile (for i in 1:1000000; x = test1(); end)
Juno.profiler()
``````

I see two allocations in the call stacks of

``````	arr1 = [rand((:A, :B)) for i = 1:100]
arr2 = [rand((:A, :B)) for i = 1:100]
``````

and `@btime` reports

``````  721.094 ns (2 allocations: 1.75 KiB)
``````

which seems consistent to me.

Edit: without randomization it is difficult to even get a profile. And if you start with the `Ref[]` trick, you get some allocations for them. Only way out would probably be to `--track-allocation`, but I’m too lazy to start that process.

Are those two allocations occuring for `arr1` and `arr2`? I am using @btime directly on `construct_tuple`, so it should not count the allocations for arr1 and arr2 (is that right?).

Are you saying that result of running `@btime` directly on `construct_tuple` is not reliable? I will search for `--track-allocation`. Thank you.

Constant propagation could make your test case compile time evaluated. But Mason knows the details of this stuff way better than me. Reference: Home · BenchmarkTools.jl (search for ‘cheat’)

Use

``````rand(SVector{3,Float64})
``````
1 Like

Beautiful:

``````function test1()
arr1 = rand(SVector{100,Float64})
arr2 = rand(SVector{100,Float64})
ans = construct_tuple(arr1, Val(2), Val(2), Val(2), arr2)
ans
end

ans = nothing
@btime (\$ans = test1())
``````

results in

``````  292.394 ns (0 allocations: 0 bytes)
``````

Looks OK to me.

Can someone please explain why this code I tried is not giving zero allocations? Am I testing this wrong in some way?

1 Like

Now we get into `@macroexpand` territory. Nice. But what I’ve learned elsewhere: you better isolate your benchmarks from the environment and bring the results back into it: this way a compiler has more difficulties to eliminate your code (but it will try nevertheless).

You have to interpolate the input variables (check the `\$`):

``````julia> @btime construct_tuple(\$arr1, Val(2), Val(2), Val(2), \$arr2)
3.273 ns (0 allocations: 0 bytes)
((:B, :A), (:A, :A), (:B, :B))

``````
1 Like

Thank you, everyone and @lmiq. I will check the interpolation. I am learning lot of new things (I thought interpolation was only for strings :))