 Very strange performance issue of a simple code

I’ve just spot a very strange behavior of this simple Julia code

function solve_inner(N)

invs = ones(Int32, N)
for q=1:N
invs[q] = 0
end

function inner(n)
return 0
end

function wrapper(n)
return inner(n)
end

for i = 2:N
s = inner(i)
# s = wrapper(i)
end

return 0
end

function solve_wrapper(N)

invs = ones(Int32, N)
for q=1:N
invs[q] = 0
end

function inner(n)
return 0
end

function wrapper(n)
return inner(n)
end

for i = 2:N
# s = inner(i)
s = wrapper(i)
end

return 0
end

N = 10^7
@time solve_inner(N)
@time solve_wrapper(N)

It gives the following results

0.223420 seconds (10.03 M allocations: 192.456 MiB, 3.94% gc time)
0.034134 seconds (27.43 k allocations: 39.591 MiB, 39.92% gc time)

As you can see, direct call of inner function in a for loop uses huge number of allocations in comparison to calling inner function via wrapper function. Any explanations?

Do you know what is stranger? If you remove the wrapper method from inside solve_inner (that does not use it anyway, the time become the same)

using BenchmarkTools

function solve_inner(N)
invs = ones(Int32, N)
for q=1:N
invs[q] = 0
end

function inner(n)
return 0
end

for i = 2:N
s = inner(i)
end

return 0
end

function solve_wrapper(N)
invs = ones(Int32, N)
for q=1:N
invs[q] = 0
end

function inner(n)
return 0
end

function wrapper(n)
return inner(n)
end

for i = 2:N
s = wrapper(i)
end

return 0
end

N = 10^7
@btime solve_inner(N)
@btime solve_wrapper(N)

gives

20.755 ms (2 allocations: 38.15 MiB)
20.694 ms (2 allocations: 38.15 MiB)

This is strange too. Though in the real code that I was working on I removed the wrapper and put its code into the for loop. Still got this strange performance hit.

This is https://github.com/JuliaLang/julia/issues/15276. You can check with @code_warntype that the Box is there in the first case.

However, I am trying this on a later commit and I get:

julia> @btime solve_inner(N)
11.124 ms (2 allocations: 38.15 MiB)
0

julia> @btime solve_wrapper(N)
11.163 ms (2 allocations: 38.15 MiB)
0

so it seems fixed. Might be fixed in the 1.2-rc1 release but needs to be tested.

3 Likes

I just tried. It is solved on 1.2-rc1.

2 Likes