I’ve just spot a very strange behavior of this simple Julia code

function solve_inner(N)
invs = ones(Int32, N)
for q=1:N
invs[q] = 0
end
function inner(n)
return 0
end
function wrapper(n)
return inner(n)
end
for i = 2:N
s = inner(i)
# s = wrapper(i)
end
return 0
end
function solve_wrapper(N)
invs = ones(Int32, N)
for q=1:N
invs[q] = 0
end
function inner(n)
return 0
end
function wrapper(n)
return inner(n)
end
for i = 2:N
# s = inner(i)
s = wrapper(i)
end
return 0
end
N = 10^7
@time solve_inner(N)
@time solve_wrapper(N)

It gives the following results

0.223420 seconds (10.03 M allocations: 192.456 MiB, 3.94% gc time)
0.034134 seconds (27.43 k allocations: 39.591 MiB, 39.92% gc time)

As you can see, direct call of inner function in a for loop uses huge number of allocations in comparison to calling inner function via wrapper function. Any explanations?

Do you know what is stranger? If you remove the wrapper method from inside solve_inner (that does not use it anyway, the time become the same)

using BenchmarkTools
function solve_inner(N)
invs = ones(Int32, N)
for q=1:N
invs[q] = 0
end
function inner(n)
return 0
end
for i = 2:N
s = inner(i)
end
return 0
end
function solve_wrapper(N)
invs = ones(Int32, N)
for q=1:N
invs[q] = 0
end
function inner(n)
return 0
end
function wrapper(n)
return inner(n)
end
for i = 2:N
s = wrapper(i)
end
return 0
end
N = 10^7
@btime solve_inner(N)
@btime solve_wrapper(N)

gives

20.755 ms (2 allocations: 38.15 MiB)
20.694 ms (2 allocations: 38.15 MiB)

This is strange too. Though in the real code that I was working on I removed the wrapper and put its code into the for loop. Still got this strange performance hit.