Extra allocations/time when a function is returned from another function than defined directly

The code below compares the results when computing a function when it is returned from another to when it is defined directly. Why is the returned version so much slower, and require any allocations at all? I can see from looking at @code_warntype Fnc!(q, x, y, p) that it is probably from type instabilities, but I can’t seem to identify how to resolve this. I need the returned function form for a separate application where a function is constructed from other user-inputs.

function return_fnc(F, P)
    Fnc! = (q, x, y, p) -> begin
        F, P = p
        q[1] = 5F(x, y, P) + 2.0
        q[2] = -2F(y, -x, P) + 3.0
        return nothing
    end
    p = F, P
    return Fnc!, p
end
function fnc!(q, x, y, p)
    F, P = p
    q[1] = 5F(x, y, P) + 2.0
    q[2] = -2F(y, -x, P) + 3.0
    return nothing
end
function benchmarks(q, x, y, Fnc!, p)
    res1 = @benchmark Fnc!(q, x, y, p)
    res2 = @benchmark fnc!(q, x, y, p)
    return res1, res2 
end
F = (x, y, p) -> (x + y)/p[1] 
P = 1.0 
Fnc!, p = return_fnc(F, P)
q = zeros(2); x = 0.5; y = 0.3
res1, res2 = benchmarks(q, x, y, Fnc!, p)
res1 
res2
julia> res1
BenchmarkTools.Trial: 10000 samples with 600 evaluations.
 Range (min … max):  203.000 ns …  42.437 ΞΌs  β”Š GC (min … max): 0.00% … 99.27%
 Time  (median):     266.667 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   303.922 ns Β± 921.797 ns  β”Š GC (mean Β± Οƒ):  6.75% Β±  2.22%

  ▇▅▁    β–ˆβ–ˆβ–…β–„β–ƒβ–‚β–                                                β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–‡β–‡β–ˆβ–‡β–‡β–ˆβ–‡β–‡β–ˆβ–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–†β–‡β–‡β–†β–‡β–†β–…β–†β–†β–†β–… β–ˆ
  203 ns        Histogram: log(frequency) by time        686 ns <

 Memory estimate: 176 bytes, allocs estimate: 11.

julia> res2
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min … max):  23.046 ns … 178.657 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     24.148 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   27.533 ns Β±  11.586 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–‡β–ˆβ–ƒβ–„β–             ▂▁                                         β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–†β–†β–†β–‡β–†β–†β–‡β–…β–‡β–†β–†β–ˆβ–ˆβ–‡β–†β–†β–†β–†β–†β–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–†β–…β–…β–…β–…β–…β–…β–…β–…β–β–„β–ƒβ–„β–ƒ β–ˆ
  23 ns         Histogram: log(frequency) by time      75.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

I think this might be an instance of the infamous closure bug, see the performance tips for an explanation and this issue for the full history.

1 Like

I had previously written return_fnc like

using FastClosures
function return_fnc(F, P)
    Fnc! = @closure (q, x, y, p) -> begin
        F, P = p
        q[1] = 5F(x, y, P) + 2.0
        q[2] = -2F(y, -x, P) + 3.0
        return nothing
    end
    p = F, P
    return Fnc!, p
end

which changes nothing; maybe I’m using FastClosures incorrectly. I also wasn’t so sure that it would be related, anyway, if I’ve defined the separate vector p so that F and P aren’t actually used.

Actually, defining it like this and removing p all together:

using FastClosures
function return_fnc(F, P)
    Fnc! = @closure (q, x, y, p) -> begin
        q[1] = 5F(x, y, P) + 2.0
        q[2] = -2F(y, -x, P) + 3.0
        return nothing
    end
    p = F, P
    return Fnc!, p
end

leads to the two methods being the same. I guess that fixes it, but still don’t really understand why.

I never used FastClosures so I won’t comment on that, but the original code’s methods are actually not quite the same. In fnc!, F, P belonged entirely to the method’s scope, but in Fnc!, F, P actually belonged to return_fnc's scope, it’s just that nested local scopes share variables when assigned (this is usually a useful behavior but may not be what you intended). return_fnc knows the type of F, P going in, but it can’t possibly predict the type of F, P because it is reassigned in the body of Fnc! that hasn’t even been called yet. That’s why F, P is type-unstable across both Fnc! and return_fnc. Just this edit:

...
Fnc! = (q, x, y, p) -> begin
        local F, P # declared different from the outer F, P!!!
        F, P = p
...

makes the methods’ @code_warntype equivalent. Not sure if the performance would be exactly the same, the global fnc! is implicitly const while the global Fnc! isn’t.

1 Like

Thank you, @Benny. I didn’t think about that difference - good to know. It makes the performance the same as far as I can tell.

This error does crop up from time to time; people like to share variables across nested local scopes…until they don’t. I wonder if there’s a static code analyzer that can indicate a variable’s home scope, maybe even an interactive one where you hover over a variable and it highlights where else it is or links you to its origin. @code_warntype is good but it could look more like the source.

1 Like