Extra allocations/time when a function is returned from another function than defined directly

DanielVandH · May 30, 2022, 9:42pm

The code below compares the results when computing a function when it is returned from another to when it is defined directly. Why is the returned version so much slower, and require any allocations at all? I can see from looking at @code_warntype Fnc!(q, x, y, p) that it is probably from type instabilities, but I can’t seem to identify how to resolve this. I need the returned function form for a separate application where a function is constructed from other user-inputs.

function return_fnc(F, P)
    Fnc! = (q, x, y, p) -> begin
        F, P = p
        q[1] = 5F(x, y, P) + 2.0
        q[2] = -2F(y, -x, P) + 3.0
        return nothing
    end
    p = F, P
    return Fnc!, p
end
function fnc!(q, x, y, p)
    F, P = p
    q[1] = 5F(x, y, P) + 2.0
    q[2] = -2F(y, -x, P) + 3.0
    return nothing
end
function benchmarks(q, x, y, Fnc!, p)
    res1 = @benchmark Fnc!(q, x, y, p)
    res2 = @benchmark fnc!(q, x, y, p)
    return res1, res2 
end
F = (x, y, p) -> (x + y)/p[1] 
P = 1.0 
Fnc!, p = return_fnc(F, P)
q = zeros(2); x = 0.5; y = 0.3
res1, res2 = benchmarks(q, x, y, Fnc!, p)
res1 
res2

julia> res1
BenchmarkTools.Trial: 10000 samples with 600 evaluations.
 Range (min … max):  203.000 ns …  42.437 μs  ┊ GC (min … max): 0.00% … 99.27%
 Time  (median):     266.667 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   303.922 ns ± 921.797 ns  ┊ GC (mean ± σ):  6.75% ±  2.22%

  ▇▅▁    ██▅▄▃▂▁                                                ▂
  ████▇▇▆█████████▇███▇███▇████▇██▇▇█▇▇█▇▇█▇▇▇▇▇▇▇▇▇▆▇▇▆▇▆▅▆▆▆▅ █
  203 ns        Histogram: log(frequency) by time        686 ns <

 Memory estimate: 176 bytes, allocs estimate: 11.

julia> res2
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min … max):  23.046 ns … 178.657 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     24.148 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   27.533 ns ±  11.586 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▃▄▁             ▂▁                                         ▂
  █████▇▇▆▆▆▇▆▆▇▅▇▆▆██▇▆▆▆▆▆▇▇█████████▇▇▇▇▇▇▇▇█▆▅▅▅▅▅▅▅▅▁▄▃▄▃ █
  23 ns         Histogram: log(frequency) by time      75.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

gdalle · May 30, 2022, 10:27pm

I think this might be an instance of the infamous closure bug, see the performance tips for an explanation and this issue for the full history.

DanielVandH · May 30, 2022, 10:40pm

I had previously written return_fnc like

using FastClosures
function return_fnc(F, P)
    Fnc! = @closure (q, x, y, p) -> begin
        F, P = p
        q[1] = 5F(x, y, P) + 2.0
        q[2] = -2F(y, -x, P) + 3.0
        return nothing
    end
    p = F, P
    return Fnc!, p
end

which changes nothing; maybe I’m using FastClosures incorrectly. I also wasn’t so sure that it would be related, anyway, if I’ve defined the separate vector p so that F and P aren’t actually used.

DanielVandH · May 30, 2022, 11:03pm

Actually, defining it like this and removing p all together:

using FastClosures
function return_fnc(F, P)
    Fnc! = @closure (q, x, y, p) -> begin
        q[1] = 5F(x, y, P) + 2.0
        q[2] = -2F(y, -x, P) + 3.0
        return nothing
    end
    p = F, P
    return Fnc!, p
end

leads to the two methods being the same. I guess that fixes it, but still don’t really understand why.

Benny · May 30, 2022, 11:09pm

I never used FastClosures so I won’t comment on that, but the original code’s methods are actually not quite the same. In fnc!, F, P belonged entirely to the method’s scope, but in Fnc!, F, P actually belonged to return_fnc’s scope, it’s just that nested local scopes share variables when assigned (this is usually a useful behavior but may not be what you intended). return_fnc knows the type of F, P going in, but it can’t possibly predict the type of F, P because it is reassigned in the body of Fnc! that hasn’t even been called yet. That’s why F, P is type-unstable across both Fnc! and return_fnc. Just this edit:

...
Fnc! = (q, x, y, p) -> begin
        local F, P # declared different from the outer F, P!!!
        F, P = p
...

makes the methods’ @code_warntype equivalent. Not sure if the performance would be exactly the same, the global fnc! is implicitly const while the global Fnc! isn’t.

DanielVandH · May 30, 2022, 11:15pm

Thank you, @Benny. I didn’t think about that difference - good to know. It makes the performance the same as far as I can tell.

Benny · May 30, 2022, 11:18pm

This error does crop up from time to time; people like to share variables across nested local scopes…until they don’t. I wonder if there’s a static code analyzer that can indicate a variable’s home scope, maybe even an interactive one where you hover over a variable and it highlights where else it is or links you to its origin. @code_warntype is good but it could look more like the source.

Topic		Replies	Views
Speed of internal function General Usage	7	460	June 16, 2021
Understanding allocations - function returned by function New to Julia	1	289	November 24, 2023
Inner function overhead Performance	2	673	February 3, 2019
Getting rid of memory allocations in nested functions Performance question , memory-allocation	3	367	January 31, 2023
Avoiding memory allocation with function passed as argument Performance memory-allocation , argument , function	14	1103	September 2, 2021

Extra allocations/time when a function is returned from another function than defined directly

Related topics