Is it possible to get allocations from 1 to 0 in this code?

Hi,

I have been trying to improve the performance of my code. The function below had 7 allocations before and some warning when using @code_warntype. I made several changes that reduced allocations from 7 to 1 and removed the red ink from @code_warntype. Is it possible to make the allocations to zero in this function?

Using TimerOutputs.jl I could not find which line is allocating. Individually, it seems all lines have zero allocations.

Thank you for reading…

function construct_tuple(arr1, ::Val{n1}, ::Val{m1}, ::Val{m2}, arr2) where {n1, m1, m2}
	len_arr1 = length(arr1)
	tup1 = (len_arr1 > n1) ? ntuple(i -> (@inbounds arr1[end-n1+i]), n1) : 
												   ntuple(i -> i <= len_arr1 ? (@inbounds arr1[i]) : :N, n1)

	len_arr2 = length(arr2)
	tup2 = (len_arr2 > m1) ? ntuple(i -> (@inbounds arr2[end-m1+i]), m1) : 
							   ntuple(i -> i <= len_arr2 ? (@inbounds arr2[i]) : :N, m1)

	tup3 = len_arr2 >= m2 ? ntuple(i -> (@inbounds arr2[i]), m2) :
									ntuple(i -> i <= len_arr2 ? (@inbounds arr2[i]) : :N, m2)	


	(tup1, tup2, tup3)
end

Can you show us what input you’re benchmarking on? When I do this, I get way more allocations than you:

julia> function construct_tuple(arr1, ::Val{n1}, ::Val{m1}, ::Val{m2}, arr2) where {n1, m1, m2}
               len_arr1 = length(arr1)
               tup1 = (len_arr1 > n1) ? ntuple(i -> (@inbounds arr1[end-n1+i]), n1) :
                                                                                                          ntuple(i -> i <= len_arr1 ? (@inbounds arr1[i]) : :N, n1)

               len_arr2 = length(arr2)
               tup2 = (len_arr2 > m1) ? ntuple(i -> (@inbounds arr2[end-m1+i]), m1) :
                                                                  ntuple(i -> i <= len_arr2 ? (@inbounds arr2[i]) : :N, m1)

               tup3 = len_arr2 >= m2 ? ntuple(i -> (@inbounds arr2[i]), m2) :
                                                                               ntuple(i -> i <= len_arr2 ? (@inbounds arr2[i]) : :N, m2)


               (tup1, tup2, tup3)
       end
construct_tuple (generic function with 1 method)

julia> let arr1 = rand(10), arr2 = rand(8)
           @btime construct_tuple($arr1, Val(5), Val(9), Val(11), $arr2)
       end
  619.801 ns (22 allocations: 976 bytes)
((0.4254217904105211, 0.13241443472571168, 0.869700471092793, 0.7124209560321589, 0.5954160420851926), (0.737801411380624, 0.556271836516959, 0.021540151975602106, 0.7126850002734751, 0.31805793324648257, 0.4001497672303781, 0.6672644424251242, 0.707907493489921, :N), (0.737801411380624, 0.556271836516959, 0.021540151975602106, 0.7126850002734751, 0.31805793324648257, 0.4001497672303781, 0.6672644424251242, 0.707907493489921, :N, :N, :N))

Hey Mason,

when I try

function test1()
	sv1 = SVector{3}(rand(3))
	sv2 = SVector{3}(rand(3))
	ans = construct_tuple(sv1, Val(1), Val(2), Val(3), sv2)
end

@btime test1()

I get 2 allocs

  88.030 ns (2 allocations: 224 bytes)
((0.9505273873529649,), (0.7657941423837595, 0.2197983373943313), (0.38703121887758996, 0.7657941423837595, 0.2197983373943313))

and they seem related to rand(). Is it possible to randomize an SVector?

1 Like

Sorry, I forgot to include arr1 and arr2.

arr1 = [rand((:A, :B)) for i = 1:100]
arr2 = [rand((:A, :B)) for i = 1:100]

construct_tuple(arr1, Val(2), Val(2), Val(2), arr2)

From the repl:

julia> arr1 = [rand((:A, :B)) for i = 1:100];

julia> arr2 = [rand((:A, :B)) for i = 1:100];

julia> @btime construct_tuple(arr1, Val(2), Val(2), Val(2), arr2)
  26.214 ns (1 allocation: 64 bytes)
((:A, :B), (:B, :A), (:A, :B))

If I do this

function test1()
	arr1 = [rand((:A, :B)) for i = 1:100]
	arr2 = [rand((:A, :B)) for i = 1:100]
	construct_tuple(arr1, Val(2), Val(2), Val(2), arr2)
end

Profile.clear()
test1()
@profile (for i in 1:1000000; x = test1(); end)
Juno.profiler()

I see two allocations in the call stacks of

	arr1 = [rand((:A, :B)) for i = 1:100]
	arr2 = [rand((:A, :B)) for i = 1:100]

and @btime reports

  721.094 ns (2 allocations: 1.75 KiB)

which seems consistent to me.

Edit: without randomization it is difficult to even get a profile. And if you start with the Ref[] trick, you get some allocations for them. Only way out would probably be to --track-allocation, but I’m too lazy to start that process.

Are those two allocations occuring for arr1 and arr2? I am using @btime directly on construct_tuple, so it should not count the allocations for arr1 and arr2 (is that right?).

Are you saying that result of running @btime directly on construct_tuple is not reliable? I will search for --track-allocation. Thank you.

Constant propagation could make your test case compile time evaluated. But Mason knows the details of this stuff way better than me. Reference: Home · BenchmarkTools.jl (search for ‘cheat’)

Use

rand(SVector{3,Float64})
1 Like

Beautiful:

function test1()
	arr1 = rand(SVector{100,Float64})
	arr2 = rand(SVector{100,Float64})
	ans = construct_tuple(arr1, Val(2), Val(2), Val(2), arr2)
	ans
end

ans = nothing
@btime ($ans = test1())

results in

  292.394 ns (0 allocations: 0 bytes)

Looks OK to me.

Can someone please explain why this code I tried is not giving zero allocations? Am I testing this wrong in some way?

1 Like

Now we get into @macroexpand territory. Nice. But what I’ve learned elsewhere: you better isolate your benchmarks from the environment and bring the results back into it: this way a compiler has more difficulties to eliminate your code (but it will try nevertheless).

You have to interpolate the input variables (check the $):

julia> @btime construct_tuple($arr1, Val(2), Val(2), Val(2), $arr2)
  3.273 ns (0 allocations: 0 bytes)
((:B, :A), (:A, :A), (:B, :B))


1 Like

Thank you, everyone and @lmiq. I will check the interpolation. I am learning lot of new things (I thought interpolation was only for strings :))