Very strange performance issue of a simple code

I’ve just spot a very strange behavior of this simple Julia code

function solve_inner(N)

  invs = ones(Int32, N)
  for q=1:N
    invs[q] = 0
  end
    
  function inner(n)
    return 0
  end
  
  function wrapper(n)
    return inner(n)
  end 

  for i = 2:N
      s = inner(i)
      # s = wrapper(i)
  end

  return 0
end


function solve_wrapper(N)

  invs = ones(Int32, N)
  for q=1:N
    invs[q] = 0
  end
  
  function inner(n)
    return 0
  end

  function wrapper(n)
    return inner(n)
  end 

  for i = 2:N
      # s = inner(i)
      s = wrapper(i)
  end
  
  return 0
end

N = 10^7
@time solve_inner(N)
@time solve_wrapper(N)

It gives the following results

  0.223420 seconds (10.03 M allocations: 192.456 MiB, 3.94% gc time)
  0.034134 seconds (27.43 k allocations: 39.591 MiB, 39.92% gc time)

As you can see, direct call of inner function in a for loop uses huge number of allocations in comparison to calling inner function via wrapper function. Any explanations?

Do you know what is stranger? If you remove the wrapper method from inside solve_inner (that does not use it anyway, the time become the same)

using BenchmarkTools

function solve_inner(N)
  invs = ones(Int32, N)
  for q=1:N
    invs[q] = 0
  end
    
  function inner(n)
    return 0
  end
  
  for i = 2:N
      s = inner(i)
  end

  return 0
end


function solve_wrapper(N)
  invs = ones(Int32, N)
  for q=1:N
    invs[q] = 0
  end
  
  function inner(n)
    return 0
  end

  function wrapper(n)
    return inner(n)
  end 

  for i = 2:N
      s = wrapper(i)
  end
  
  return 0
end

N = 10^7
@btime solve_inner(N)
@btime solve_wrapper(N)

gives

20.755 ms (2 allocations: 38.15 MiB)
20.694 ms (2 allocations: 38.15 MiB)

This is strange too. Though in the real code that I was working on I removed the wrapper and put its code into the for loop. Still got this strange performance hit.

This is performance of captured variables in closures · Issue #15276 · JuliaLang/julia · GitHub. You can check with @code_warntype that the Box is there in the first case.

However, I am trying this on a later commit and I get:

julia> @btime solve_inner(N)
  11.124 ms (2 allocations: 38.15 MiB)
0

julia> @btime solve_wrapper(N)
  11.163 ms (2 allocations: 38.15 MiB)
0

so it seems fixed. Might be fixed in the 1.2-rc1 release but needs to be tested.

3 Likes

I just tried. It is solved on 1.2-rc1.

2 Likes