I stumbled upon some strange timings today. I’m posting a somewhat minimal below. Essentially it seems that if you write a function that returns a function what you return along with it dramatically effects the speed of that function.
function makeQRfun(n)
function myfun(X)
T = Set(mod.((0:n-1).^2,n))
s = 0
for x = X
s += (mod(x,n) in T)
end
return s
end
t = n+1
return myfun, t
end
function makeQRfun2(n)
function myfun(X)
T = Set(mod.((0:n-1).^2,n))
s = 0
for x = X
s += (mod(x,n) in T)
end
return s
end
T = Set(mod.((0:n-1).^2,n))
t = length(T)
return myfun, t
end
X = rand(UInt64,10000);
f1,t1 = makeQRfun(17)
f2,t2 = makeQRfun2(17)
f1(X) == f2(X) #sanity check
using BenchmarkTools
julia> @benchmark f1(X)
BenchmarkTools.Trial:
memory estimate: 1.04 KiB
allocs estimate: 10
--------------
minimum time: 251.344 μs (0.00% GC)
median time: 253.425 μs (0.00% GC)
mean time: 255.124 μs (0.00% GC)
maximum time: 307.815 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark f2(X)
BenchmarkTools.Trial:
memory estimate: 142.21 KiB
allocs estimate: 9045
--------------
minimum time: 536.260 μs (0.00% GC)
median time: 559.221 μs (0.00% GC)
mean time: 563.194 μs (0.44% GC)
maximum time: 1.448 ms (61.75% GC)
--------------
samples: 8847
evals/sample: 1
Yes I should have mentioned that they use quite different amounts of memory. But I am still surprised. First of all when I call it with n=17, T is a tiny set (9 elements). Moreover, both are suppose to return a function and the code for that function is identical. So when you run them they presumably should do the exact same thing regardless of the extra code in the exterior function block.
Looks like Performance Tips · The Julia Language and the related issue #15276, specifically because T is assigned both within the closure (the inner function) and outside it.
The let or type annotation tricks from the performance tips (first link) will probably work, but even easier in this case is just to avoid the problem by not assigning the same variable inside and outside:
julia> function makeQRfun3(n)
function myfun(X)
T = Set(mod.((0:n-1).^2,n))
s = 0
for x = X
s += (mod(x,n) in T)
end
return s
end
T_outer = Set(mod.((0:n-1).^2,n))
t = length(T_outer)
return myfun, t
end
makeQRfun3 (generic function with 1 method)
julia> f3,t3 = makeQRfun3(17)
(var"#myfun#3"{Int64}(17), 9)
julia> @btime f3($X)
251.194 μs (9 allocations: 1.02 KiB)
5236
On my machine, f1 and f3 have about the same performance.
Thanks very much for the reply. That seems to explain it. It does seem quite strange and subtle to me that code following the function definition effects its implementation, but good to know and something to watch out for.
Yes, I agree–the compiler should be able to do better in this case (that’s why it’s still an open issue).
Note, though, that there are certainly cases where code after the inner function definition can affect behavior, which is why this sort of case isn’t trivial to optimize:
julia> function outer()
x = 1
function inner()
println("x = ", x)
end
x = 2
return inner
end
outer (generic function with 1 method)
julia> f = outer()
(::var"#inner#5") (generic function with 1 method)
julia> f()
x = 2
x is the same binding inside outer and inner, so the assignment to x after the definition of inner still affects the behavior of inner.