Function definition inside let-block impacting performance?

Is there some reason why a function defined inside a let-block is slower compared to defining the same function in global scope?
e.g. below recursive Fibonacci (from the old Julia microbenchmarks) is 10 times slower if defined inside let-block. Is some optimization step not applied inside the let-block or am I just missing something?

With Julia 1.10 REPL

``````julia> let
fib(n) = n < 2 ? n : fib(n-1) + fib(n-2)
@time a=fib(35)
@time b=fib(35)
@time c=fib(35)
@time d=fib(35)
@time e=fib(35)
@time f=fib(35)
end
0.407809 seconds (29.88 k allocations: 536.734 KiB, 0.82% compilation time)
0.411220 seconds (28.66 k allocations: 448.125 KiB)
0.392723 seconds (28.66 k allocations: 448.125 KiB)
0.408793 seconds (28.66 k allocations: 448.125 KiB)
0.407156 seconds (28.66 k allocations: 448.125 KiB)
0.415548 seconds (28.66 k allocations: 448.125 KiB)
9227465

julia> begin
fib(n) = n < 2 ? n : fib(n-1) + fib(n-2)
@time a=fib(35)
@time b=fib(35)
@time c=fib(35)
@time d=fib(35)
@time e=fib(35)
@time f=fib(35)
end
0.035875 seconds (941 allocations: 63.344 KiB, 5.92% compilation time)
0.033755 seconds (9 allocations: 400 bytes)
0.033742 seconds (9 allocations: 400 bytes)
0.033796 seconds (9 allocations: 400 bytes)
0.033766 seconds (9 allocations: 400 bytes)
0.033747 seconds (9 allocations: 400 bytes)
9227465
``````
1 Like

Have you tried using BenchmarkTools.jl `@btime`? That would show you if thereâ€™s a difference in time spent compiling vs executing.

Edit: never mind. Those are reported here also. Looks like thereâ€™s a semi-significant difference though Iâ€™m not sure why the closure makes such a difference.

I tried Benchmark tools, but apparently it tries to evaluate in the global scope.

``````julia> using BenchmarkTools

julia> let
fib(n) = n < 2 ? n : fib(n-1) + fib(n-2)
@btime a=fib(35)
end
ERROR: UndefVarError: `fib` not defined
Stacktrace:
[1] var"##core#230"()
@ Main ~/.julia/packages/BenchmarkTools/QAv7m/src/execution.jl:547
[2] var"##sample#231"(::Tuple{}, __params::BenchmarkTools.Parameters)
``````

In particular, put your code into code blocks. That will make it readable and prevent unwanted pings.

Yes, I was lazy.

Using `code_warntype` would seem, if I at all understand what it shows, that `fib` is checked if it is defined in the global scope. But is that necessary? Is the recursive calls to `fib` understood not to refer to `fib` defined inside the let-block?

``````julia> let
local fib(n) = n < 2 ? n : fib(n-1) + fib(n-2)
code_warntype(fib, (Int,))
end
MethodInstance for (::var"#fib#8")(::Int64)
from (::var"#fib#8")(n) @ Main REPL[9]:2
Arguments
#self#::var"#fib#8"
n::Int64
Locals
fib@_3::Union{}
fib@_4::Union{}
Body::Any
1 â”€ %1  = (n < 2)::Bool
â””â”€â”€       goto #3 if not %1
2 â”€       return n
3 â”€ %4  = Core.getfield(#self#, :fib)::Core.Box
â”‚   %5  = Core.isdefined(%4, :contents)::Bool
â””â”€â”€       goto #5 if not %5
4 â”€       goto #6
5 â”€       Core.NewvarNode(:(fib@_3))
â””â”€â”€       fib@_3
6 â”„ %10 = Core.getfield(%4, :contents)::Any
â”‚   %11 = (n - 1)::Int64
â”‚   %12 = (%10)(%11)::Any
â”‚   %13 = Core.getfield(#self#, :fib)::Core.Box
â”‚   %14 = Core.isdefined(%13, :contents)::Bool
â””â”€â”€       goto #8 if not %14
7 â”€       goto #9
8 â”€       Core.NewvarNode(:(fib@_4))
â””â”€â”€       fib@_4
9 â”„ %19 = Core.getfield(%13, :contents)::Any
â”‚   %20 = (n - 2)::Int64
â”‚   %21 = (%19)(%20)::Any
â”‚   %22 = (%12 + %21)::Any
â””â”€â”€       return %22

``````

I think this could be a variation of the problem in this recent thread

But I am too tired to see how the connection is exactly.

I donâ€™t see that.

What I see is the `fib` closure checking the `:fib` field of its underlying struct, but the compiler doesnâ€™t know `fib` only calls itself so it holds a `Core.Box` instance with mystery `:contents` (we know itâ€™s `fib`).

If that didnâ€™t make sense, then this needs more explanation. In the global scope, a `function` block makes or adds methods (NOT mutation) to a `const`-named function with a singleton immutable struct type (singleton means only 1 instance). When such functions call each other, even recursively, they donâ€™t need to (and canâ€™t) store references for each otherâ€™s names. If any global name used by a function is `const` or annotated with a concrete type, the compiler can infer the assigned instance and optimize its code well in the function call. Otherwise, the compiler makes the function call check the global name and figure out what to do at runtime. The former performant case relies on global names persisting, and the latter case relies on the global scope persisting uniquely, so thatâ€™s why they do.

Local names however do not outlive their local scope unless a closure captures them, and captured variables still only last as long as the closures do; moreover, function scopes can be repeated. To implement this, closures do have to store or reference the assigned instances. Now we need to talk about how closures (locally defined functions) are implemented; hereâ€™s a link to an approximation of how so itâ€™s easier to follow along. The closureâ€™s type is lifted to the global scope and has a field for every local name the closure captures; this is good for compilation and performance because closure variants capturing different variables in a loop can share the same type. The method body is also lifted to the global scope as a `function` block for the type; note that you can also do this for any `const`-name function like `(::typeof(foo))(x) = 0`, but the `const` name looks much better. The `function` block in the local scope would essentially be replaced by a type constructor call on the captured variables. In most cases, if a captured variable is never reassigned, the lowerer will give the field a type parameter inferred from the input instance; if it is reassigned, then itâ€™s a `Core.Box` that can contain anything. So, the typical advice for good type inference is to never reassign captured variables. That is, except when you get recursive: the type constructor call for `fib` needs to take in `fib`â€¦but the `fib` instance doesnâ€™t exist yet so you get a `Core.Box` placeholder.

There doesnâ€™t really seem like an easy fix to this. You canâ€™t have a circular inlined immutable struct instance, so you need some sort of reference, like giving `Core.Box` a type parameter for its contents. That would help type inference in many cases, but you canâ€™t possibly write out circular parameters either, like `Fib{Fib{Fib{Fib...`. Best you can do is `Fib{Fib}`, the parameter being an abstract iterated union type so that doesnâ€™t really help the compiler. The only real option is to work in the global scope where you can have fully featured functions. Another thing you canâ€™t do with functions in local scopes: conditionally define methods.

5 Likes
1 Like

Thanks for the detailed explanation.
It was very helpful. I had in back of my mind assumed that locally defined functions would be `const`, but realized no that they are not.

As pointed out in this comment, the gating item seems to be finding acceptable rules for `local const`.

If such rules could be decided, then we could impose that the locally-declared named function behaves as a `const`â€”and then when its definition gets hoisted to global scope, instead of needing to capture itself it could simply refer to itself.

For example: If I say

``````let; fib(n) = n â‰¤ 1 ? n : fib(n-1)+fib(n-2)  end
``````

a global struct is defined with a name like `var"#fib#3"`. Instances of this struct are callable, just like any functor. Its definition is approximately this:

``````struct var"#fib#3"
fib :: Core.Box
end
(fib::var"#fib#3")(n) = n â‰¤ 1 ? n : fib.fib.contents(n-1)+fib.fib.contents(n-2)
``````

Within the local scope, an instance of this struct is constructed and incompletely initialized, and then its `fib` field is set to reference itself. Itâ€™s approximately this:

``````let fib=var"#fib#3"(Core.Box());  fib.fib.contents=fib  end
``````

The only reason a closure (and the attendant type-unstable `Core.Box`) is necessary is because, at present, itâ€™s possible that the locally-defined identifier `fib` will be reassigned during the lifetime of the constructed functor. If we knew that it would never be reassigned by deciding that it was a `const`, then the definition of the global struct could be changed to reference itself, since it would be known that the local identifier `fib` would never refer to anything other than the functor instance itself. Like this:

``````struct var"#fib#3"
end
(fib::var"#fib#3")(n) = n â‰¤ 1 ? n : fib(n-1)+fib(n-2)
``````

Hmmâ€¦

Akshually now that I think about it, even without making the function a `local const`, itâ€™s knowable that `fib` is never changed during the functorâ€™s lifetime (after initialization) since itâ€™s never syntactically reassigned after or inside the function definition â€¦ so this problem is actually totally avoidable, lol. We could eliminate the capture and the box now, without deciding anything about `local const` nor changing any semantics (and without using the `var"#self#"` hack described in that issue).

@StefanKarpinski it seems like this could be a breakthrough on the topic of locally-defined recursive named functions.

Thatâ€™s a clever trick. Hereâ€™s how it can be automated:

``````using ExprTools
struct F{Name} end

macro stable(fdef)
d = splitdef(fdef)
name = d[:name]
gname = gensym(name)
f = :(\$F{\$(QuoteNode(gname))})
d[:name] = :((\$name::\$f))
esc(quote
\$name = \$f()
\$(combinedef(d))
\$name
end)
end
``````

Now compare:

``````julia> let
@stable fib(n) = n â‰¤ 1 ? n : fib(n-1)+fib(n-2)
@btime \$fib(20)
end
29.025 ÎĽs (0 allocations: 0 bytes)
6765

julia> let
fib(n) = n â‰¤ 1 ? n : fib(n-1)+fib(n-2)
@btime \$fib(20)
end
363.609 ÎĽs (20 allocations: 320 bytes)
6765
``````
1 Like

Those do introduce a problem of unintended reassignment (though itâ€™s already good practice to be aware of local variables in a scope so they donâ€™t clash), but implementing `const` for local variables to guarantee they wonâ€™t be reassigned wonâ€™t improve this situation because local variables still donâ€™t persist, which makes closures implemented the way they are with all the limitations.

Thatâ€™s not why. `fib` isnâ€™t reassigned, and in typical cases, unreassigned local variables will be lowered to a type parameter instead of `Core.Box`, like `fib::T`. However, that doesnâ€™t happen if the captured variable was not assigned by the time the lowered constructor call occurs

``````julia> let
a() = 1 # only assigned here
f() = a() # typeof(f)(a) instantiation
end
(::var"#f#15"{var"#a#14"}) (generic function with 1 method)

julia> let
f() = a() # typeof(f)(a) instantiation
a() = 1 # only assigned here
f
end
(::var"#f#16") (generic function with 1 method)
``````

The `@code_warntype` reports on the calls of the returned closures show the stability and instability as youâ€™d expect. The `fib` example is worse because `fib` couldnâ€™t possibly be assigned prior to its own closureâ€™s instantiation. A hypothetical workaround by this logic is to assign something first, whether by making another method or just the function declaration, but that is treated like multiple assignments even if itâ€™s not:

``````julia> let
a() = 1 # only assigned here
a(x) = 2 # no reassignment, obviously unrelated method
f() = a()
end
(::var"#f#30") (generic function with 1 method)
``````

Itâ€™s not because the lowerer is confused by the assignment form of method definitions, this still occurs with `function` blocks.

I think this lowering should be possible. Method definitions, even assignment forms, are recognizably not assignments in either the source or the expression (first `args` element is a `:call` expression), so if it can recognize the local variable is in fact not reassigned and is assigned to a function, it can do this instead of making an impossible instance with a circular field in its type `var"#fib#3{var"#fib#3"{var"#fib#3"{var"#fib#3"...`. Then the lowered type constructor call wouldnâ€™t need its yet-to-exist instance as an input.

I think this should also work when you involve another captured variable that isnâ€™t assigned to a function and thus needs to be stored in a field. Say `fib` checks `n <= a` for no good reason where `let a = 1`. Then it should lower to an approximation of:

``````struct var"#fib#3"{T}
a::T # we know T == Int but the lowerer doesn't
end
(fib::var"#fib#3")(n) = n â‰¤ fib.a ? n : fib(n-1)+fib(n-2)

let a = 1
fib = var"#fib#3"(a)
end
``````
1 Like

Yup! My proposal here is simply to recognize that there are no syntactical assignments to `fib` inside nor after its definition(s) nor inside the definition of any other function, and therefore it doesnâ€™t need to capture itself. Within its functorâ€™s definition it can simply refer directly to itself. It should work for closures that capture variables from their environment too.

This is actually a different clever trick from what I suggested; what you wrote causes the function to be declared as a singleton object, which has interesting differences in how it captures variables from its environment, but also it doesnâ€™t work inside function definitions because those often need to be closures (non-singleton). For example:

``````julia> genfib(a) = @stable fib(n) = n â‰¤ a ? n : fib(n-1)+fib(n-2)
ERROR: syntax: Global method definition around REPL[7]:9 needs to be placed at the top level, or use "eval".
``````

My proposal here should work with recursive closures defined inside functions, not just locally-scoped within `let` statements.

However, I donâ€™t currently see a way to make my proposal work with two mutually recursive closures. It might be the case that efficient locally-declared mutually recursive functions can only exist in `let` statements using something similar to your trick, and not be defined inside a function?

Is the problem that transforming mutually accessible local names to callable variables means thereâ€™s no mutually accessible global names to reference, like this:

``````struct Iseven end; struct Isodd end;
(is_even::Iseven)(n::Int) = n == 0 ? true : is_odd(n - 1)
(is_odd::Isodd)(n::Int) = n == 0 ? false : is_even(n - 1)

Iseven()(3) # UndefVarError: `is_odd` not defined
``````

Then neither approach so far would work, the macro also does this:

``````    f = :(\$F{\$(QuoteNode(gname))})
d[:name] = :((\$name::\$f))
``````

except every `@stable function` is transformed to a subtype of the iterated union `F`.

Yes exactly, unless they are singletons (where the callableâ€™s name is not needed since it can be instantiated from the type at anytime).

As currently implemented true, but with a rewrite the macro could take multiple mutually recursive function declarations together and define them so they access each otherâ€™s global-scoped singleton.

Now that I think about this further, there might be a way which isnâ€™t too inefficient (it retains type stability). The closures can capture each other, with one closure capturing the other in a mutable container and the other not:

``````julia> struct IsEven{T}  isodd::Base.RefValue{T}  end
struct IsOdd      iseven::IsEven{IsOdd}    end
(iseven::IsEven)(n::Int) = n == 0 ? true  : iseven.isodd[](n-1)
(isodd ::IsOdd )(n::Int) = n == 0 ? false : isodd.iseven(n-1)

iseven = IsEven(Ref{IsOdd}())
isodd = IsOdd(iseven)
iseven.isodd[] = isodd
IsOdd(IsEven{IsOdd}(Base.RefValue{IsOdd}(IsOdd(#= circular reference @-3 =#))))

julia> iseven.((1,2,3,4))
(false, true, false, true)

julia> isodd.((1,2,3,4))
(true, false, true, false)
``````

Note that I needed to parameterize `IsEven` because the `IsOdd` type wasnâ€™t defined yet.

Comparing this against global singletons:

``````julia> using BenchmarkTools
struct F{T} end
(::F{:iseven})(n::Int) = n == 0 ? true : F{:isodd}()(n-1)
(::F{:isodd})(n::Int) = n == 0 ? false : F{:iseven}()(n-1)

@btime \$iseven(20)
@btime F{:iseven}()(20);
56.504 ns (0 allocations: 0 bytes)
22.846 ns (0 allocations: 0 bytes)
``````

Comparing this against how local functions currently capture each other:

``````julia> unstable_iseven, unstable_isodd = let
iseven(n::Int) = n == 0 ? true : isodd(n-1)
isodd(n::Int) = n == 0 ? false : iseven(n-1)
iseven, isodd
end
@btime \$unstable_iseven(20);
185.325 ns (0 allocations: 0 bytes)
``````

My type-stable implementation above gets me thinkingâ€¦ If we just had a way to express circular type parameters, then we could make them type-stableâ€¦ Right?

1 Like

Essentially replacing `Core.Box` with a typed reference, which is what many want to be able to happen but the lowerer canâ€™t know what the compiler can. This looks feasible to lower to for yet-to-instantiate inputs assigned to unreassigned variables, though if it were easy to edit the lowerer, weâ€™d already have the low-hanging fruit like inferring pre-closures reassignments or reassignments with right-hand expressions only using outer scope variables and literals. If performance is the goal then itâ€™d be better to do simpler global functions and avoid the `Ref` allocation; you can make a whole lot of closures in a loop.

Now that you point out that functionsâ€™ types are global and accessible, my nonworking example couldâ€™ve replaced `isodd` with `Isodd()`, and it doesnâ€™t generate an unwarranted supertype `F`, though we probably should be subtyping `Function` for consistency. This could also be applied to the recursive `fib` example for consistency, but I have concerns about instantiation causing runtime overhead, which isnâ€™t needed when the input callable is already there. I also took the liberty of checking if mutual recursion also plays well with capturing other variables:

``````struct Iseven{T,S}<:Function a::T; b::S end; struct Isodd{T,S}<:Function a::T; b::S end;
(is_even::Iseven)(n::Int) = n == is_even.a ? true : Isodd(is_even.a, is_even.b)(n - 1)
(is_odd::Isodd)(n::Int) = n == is_odd.b ? false : Iseven(is_odd.a, is_odd.b)(n - 1)

let a = 0, b = 0
is_even = Iseven(a, b)
is_odd = Isodd(a, b)
is_even(8), is_odd(8)
end
``````

Neat. But this and the other approaches make it apparent that mutually recursive methods require the lowerer (or macro) to work over the entire local scope, like to figure out the later `is_odd` is a locally defined function to be lifted unlike the other variables captured by `is_even`.

Like if we could write `Fib{Int, Base.RefValue{#=head=#}}`? Iâ€™m not certain but I think this should be inferrable. I dislike the idea but the uninstantiable `struct X next::X end` already exists so maybe this would just be another â€śuser bewareâ€ť feature.

Itâ€™s true that a lot of low hanging fruit could be picked by now if editing the lowerer was easier, but I wouldnâ€™t be so certain. Sometimes people (not you) rather spend their time telling others that things canâ€™t be done instead of trying to actually do them, and if they just spent that time doing them theyâ€™d be done; sometimes we need to show people whatâ€™s possible.

Performance is always a goal, this is Julia

Agree for singletons. For mutually-recursive closures, however, I wanted to see if itâ€™s possible to achieve the same performance. Turns out no, but we can get close. I havenâ€™t yet imagined a way for mutually recursive closures to reach the performance of global singletons, but achieving type stability is a good start.

Is the juice worth the squeeze, considering how niche this is? I leave this question open. Itâ€™s only on my radar because of this comment.

It does. This is how it determines, for example, whether or not to box a local when capturing it.

What the lowerer cannot do is know the return types of functions (for now). For example, it doesnâ€™t know that `/(1,2)` has a return type of `Float64`, so when boxing a capture `a=1/2` it doesnâ€™t know to put it in a `Box{Float64}` and instead must use a `Box{Any}`. (supposing `Core.Box` is type-parameterizable.)

So in the case of mutually recursive captures, it looks possible. How difficult it is and whether itâ€™s worth it is the question.

Yes exactly. Iâ€™ve run into this problem before (I couldnâ€™t achieve type stability because of a flexibly-parameterized object that needed to self-reference). I gave up and canâ€™t remember the details though; maybe I couldâ€™ve solved it some other way.

On the one hand, â€śmutually-recursive named functions in local scope which are maximally performantâ€ť isnâ€™t really an urgent addition to the language.

But it surfaces what looks like a basic flaw in how Julia implements local scope, one which shows up in a number of places: the bad performance for closures which capture values which might be shared, the strange/wrong consequences of conditional method definitions, and the mandatory boxing of self-calls to a function, for which mutual-recursion is just a special case insofar as it doesnâ€™t admit certain workarounds available for self-recursion.

To be clear, I donâ€™t think thereâ€™s anything wrong with the (intended) semantics of Julia scope, just the implementation of them.

Iâ€™m still working on figuring out how that implementation works, so take anything I say in that spirit, since it could just be my ignorance talking. But I wonder if the answer might start with looking into Lua upvalues, as an example of a local scope implementation which doesnâ€™t suffer these issues. This would need to be adapted, given that functions are a composite type which may have any number of methods, but I suspect the approach is amenable to accommodating that.

Itâ€™s not uncommon that the clearest illustration of such an issue is somewhat contrived, a description which does apply to local mutually-recursive functions. That doesnâ€™t mean that a fix for the issue wonâ€™t have positive effects on normal code, in particular, removing the anonymous function penalty would be of great benefit.