Very strange performance issue of a simple code

Dandan · June 16, 2019, 7:38pm

I’ve just spot a very strange behavior of this simple Julia code

function solve_inner(N)

  invs = ones(Int32, N)
  for q=1:N
    invs[q] = 0
  end
    
  function inner(n)
    return 0
  end
  
  function wrapper(n)
    return inner(n)
  end 

  for i = 2:N
      s = inner(i)
      # s = wrapper(i)
  end

  return 0
end


function solve_wrapper(N)

  invs = ones(Int32, N)
  for q=1:N
    invs[q] = 0
  end
  
  function inner(n)
    return 0
  end

  function wrapper(n)
    return inner(n)
  end 

  for i = 2:N
      # s = inner(i)
      s = wrapper(i)
  end
  
  return 0
end

N = 10^7
@time solve_inner(N)
@time solve_wrapper(N)

It gives the following results

  0.223420 seconds (10.03 M allocations: 192.456 MiB, 3.94% gc time)
  0.034134 seconds (27.43 k allocations: 39.591 MiB, 39.92% gc time)

As you can see, direct call of inner function in a for loop uses huge number of allocations in comparison to calling inner function via wrapper function. Any explanations?

Henrique_Becker · June 16, 2019, 8:00pm

Do you know what is stranger? If you remove the wrapper method from inside solve_inner (that does not use it anyway, the time become the same)

using BenchmarkTools

function solve_inner(N)
  invs = ones(Int32, N)
  for q=1:N
    invs[q] = 0
  end
    
  function inner(n)
    return 0
  end
  
  for i = 2:N
      s = inner(i)
  end

  return 0
end


function solve_wrapper(N)
  invs = ones(Int32, N)
  for q=1:N
    invs[q] = 0
  end
  
  function inner(n)
    return 0
  end

  function wrapper(n)
    return inner(n)
  end 

  for i = 2:N
      s = wrapper(i)
  end
  
  return 0
end

N = 10^7
@btime solve_inner(N)
@btime solve_wrapper(N)

gives

20.755 ms (2 allocations: 38.15 MiB)
20.694 ms (2 allocations: 38.15 MiB)

Dandan · June 16, 2019, 8:23pm

This is strange too. Though in the real code that I was working on I removed the wrapper and put its code into the for loop. Still got this strange performance hit.

kristoffer.carlsson · June 16, 2019, 8:47pm

This is performance of captured variables in closures · Issue #15276 · JuliaLang/julia · GitHub. You can check with @code_warntype that the Box is there in the first case.

However, I am trying this on a later commit and I get:

julia> @btime solve_inner(N)
  11.124 ms (2 allocations: 38.15 MiB)
0

julia> @btime solve_wrapper(N)
  11.163 ms (2 allocations: 38.15 MiB)
0

so it seems fixed. Might be fixed in the 1.2-rc1 release but needs to be tested.

aaowens · June 16, 2019, 8:55pm

I just tried. It is solved on 1.2-rc1.

Topic		Replies	Views
Surprising runtime behaviour when wrapping functions Performance question	4	415	September 14, 2021
Function not faster than global scope Performance	7	1032	May 13, 2020
The columns and the rows - puzzled about performance Performance question	8	904	October 14, 2021
Running small functions bundled in an outer function taking twice the time of running them separately Performance benchmarktools	7	1053	April 2, 2020
10x slowdown when passing function as argument Performance	15	2268	June 4, 2020

Very strange performance issue of a simple code

Related topics