Speed of internal function

Daniel_Scheinerman · June 14, 2021, 9:30pm

I stumbled upon some strange timings today. I’m posting a somewhat minimal below. Essentially it seems that if you write a function that returns a function what you return along with it dramatically effects the speed of that function.

 function makeQRfun(n)
    function myfun(X)
        T = Set(mod.((0:n-1).^2,n))
        s = 0
        for x = X
            s += (mod(x,n) in T)
        end
        return s
    end
    t = n+1    
    return myfun, t
end
   
function makeQRfun2(n)
    function myfun(X)
        T = Set(mod.((0:n-1).^2,n))
        s = 0
        for x = X
            s += (mod(x,n) in T)
        end
        return s
    end
    T = Set(mod.((0:n-1).^2,n))
    t = length(T)
    return myfun, t
end

X = rand(UInt64,10000);
f1,t1 = makeQRfun(17)
f2,t2 = makeQRfun2(17)
f1(X) == f2(X) #sanity check
using BenchmarkTools


julia> @benchmark f1(X)
BenchmarkTools.Trial:
  memory estimate:  1.04 KiB
  allocs estimate:  10
  --------------
  minimum time:     251.344 μs (0.00% GC)
  median time:      253.425 μs (0.00% GC)
  mean time:        255.124 μs (0.00% GC)
  maximum time:     307.815 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark f2(X)
BenchmarkTools.Trial:
  memory estimate:  142.21 KiB
  allocs estimate:  9045
  --------------
  minimum time:     536.260 μs (0.00% GC)
  median time:      559.221 μs (0.00% GC)
  mean time:        563.194 μs (0.44% GC)
  maximum time:     1.448 ms (61.75% GC)
  --------------
  samples:          8847
  evals/sample:     1

carstenbauer · June 14, 2021, 9:37pm

The second function allocates much more (T = Set(mod.((0:n-1).^2,n))) so this isn’t surprising, is it?

Daniel_Scheinerman · June 14, 2021, 10:07pm

Yes I should have mentioned that they use quite different amounts of memory. But I am still surprised. First of all when I call it with n=17, T is a tiny set (9 elements). Moreover, both are suppose to return a function and the code for that function is identical. So when you run them they presumably should do the exact same thing regardless of the extra code in the exterior function block.

rdeits · June 14, 2021, 10:21pm

Looks like Performance Tips · The Julia Language and the related issue #15276, specifically because T is assigned both within the closure (the inner function) and outside it.

The let or type annotation tricks from the performance tips (first link) will probably work, but even easier in this case is just to avoid the problem by not assigning the same variable inside and outside:

julia> function makeQRfun3(n)
           function myfun(X)
               T = Set(mod.((0:n-1).^2,n))
               s = 0
               for x = X
                   s += (mod(x,n) in T)
               end
               return s
           end
           T_outer = Set(mod.((0:n-1).^2,n))
           t = length(T_outer)
           return myfun, t
       end
makeQRfun3 (generic function with 1 method)

julia> f3,t3 = makeQRfun3(17)
(var"#myfun#3"{Int64}(17), 9)

julia> @btime f3($X)
  251.194 μs (9 allocations: 1.02 KiB)
5236

On my machine, f1 and f3 have about the same performance.

rdeits · June 14, 2021, 10:22pm

By the way, the telltale thing to look for to identify cases like this is a red Core.Box annotation in your @code_warntype:

julia> @code_warntype f2(X)
Variables
  #self#::var"#myfun#2"{Int64}
  X::Vector{UInt64}
...
...
...
│   %11 = Core.getfield(#self#, :T)::Core.Box

...
...

Daniel_Scheinerman · June 15, 2021, 2:31am

Thanks very much for the reply. That seems to explain it. It does seem quite strange and subtle to me that code following the function definition effects its implementation, but good to know and something to watch out for.

rdeits · June 15, 2021, 1:03pm

Yes, I agree–the compiler should be able to do better in this case (that’s why it’s still an open issue).

Note, though, that there are certainly cases where code after the inner function definition can affect behavior, which is why this sort of case isn’t trivial to optimize:

julia> function outer()
         x = 1
         function inner()
           println("x = ", x)
         end
         x = 2
         return inner
       end
outer (generic function with 1 method)

julia> f = outer()
(::var"#inner#5") (generic function with 1 method)

julia> f()
x = 2

x is the same binding inside outer and inner, so the assignment to x after the definition of inner still affects the behavior of inner.

Daniel_Scheinerman · June 16, 2021, 6:09pm

Very good point, thanks for the responses!

Topic		Replies	Views
Very strange performance issue of a simple code General Usage performance	4	577	June 16, 2019
10x slowdown when passing function as argument Performance	15	2250	June 4, 2020
Why are calls to this function slow, compared to Python+Numpy? General Usage fast-math	14	2097	August 13, 2017
Significant difference in execution speed between similar functions with nested loops Performance question	4	239	March 8, 2023
How to call a function that returns a function efficiently General Usage	24	2318	June 24, 2018

Speed of internal function

Related topics