Function definition inside let-block impacting performance?

Is there some reason why a function defined inside a let-block is slower compared to defining the same function in global scope?
e.g. below recursive Fibonacci (from the old Julia microbenchmarks) is 10 times slower if defined inside let-block. Is some optimization step not applied inside the let-block or am I just missing something?

With Julia 1.10 REPL

julia> let 
       fib(n) = n < 2 ? n : fib(n-1) + fib(n-2)
       @time a=fib(35) 
       @time b=fib(35) 
       @time c=fib(35) 
       @time d=fib(35)  
       @time e=fib(35)  
       @time f=fib(35) 
       end
  0.407809 seconds (29.88 k allocations: 536.734 KiB, 0.82% compilation time)
  0.411220 seconds (28.66 k allocations: 448.125 KiB)
  0.392723 seconds (28.66 k allocations: 448.125 KiB)
  0.408793 seconds (28.66 k allocations: 448.125 KiB)
  0.407156 seconds (28.66 k allocations: 448.125 KiB)
  0.415548 seconds (28.66 k allocations: 448.125 KiB)
9227465

julia> begin 
       fib(n) = n < 2 ? n : fib(n-1) + fib(n-2)
       @time a=fib(35) 
       @time b=fib(35) 
       @time c=fib(35) 
       @time d=fib(35)  
       @time e=fib(35)  
       @time f=fib(35) 
       end
  0.035875 seconds (941 allocations: 63.344 KiB, 5.92% compilation time)
  0.033755 seconds (9 allocations: 400 bytes)
  0.033742 seconds (9 allocations: 400 bytes)
  0.033796 seconds (9 allocations: 400 bytes)
  0.033766 seconds (9 allocations: 400 bytes)
  0.033747 seconds (9 allocations: 400 bytes)
9227465
1 Like

Have you tried using BenchmarkTools.jl @btime? That would show you if there’s a difference in time spent compiling vs executing.

Edit: never mind. Those are reported here also. Looks like there’s a semi-significant difference though I’m not sure why the closure makes such a difference.

I tried Benchmark tools, but apparently it tries to evaluate in the global scope.

julia> using BenchmarkTools

julia> let 
       fib(n) = n < 2 ? n : fib(n-1) + fib(n-2)
       @btime a=fib(35) 
       end
ERROR: UndefVarError: `fib` not defined
Stacktrace:
  [1] var"##core#230"()
    @ Main ~/.julia/packages/BenchmarkTools/QAv7m/src/execution.jl:547
  [2] var"##sample#231"(::Tuple{}, __params::BenchmarkTools.Parameters)

See Please read: make it easier to help you

In particular, put your code into code blocks. That will make it readable and prevent unwanted pings.

Yes, I was lazy.

Using code_warntype would seem, if I at all understand what it shows, that fib is checked if it is defined in the global scope. But is that necessary? Is the recursive calls to fib understood not to refer to fib defined inside the let-block?

julia> let
       local fib(n) = n < 2 ? n : fib(n-1) + fib(n-2)
       code_warntype(fib, (Int,))
       end
MethodInstance for (::var"#fib#8")(::Int64)
  from (::var"#fib#8")(n) @ Main REPL[9]:2
Arguments
  #self#::var"#fib#8"
  n::Int64
Locals
  fib@_3::Union{}
  fib@_4::Union{}
Body::Any
1 ─ %1  = (n < 2)::Bool
└──       goto #3 if not %1
2 ─       return n
3 ─ %4  = Core.getfield(#self#, :fib)::Core.Box
│   %5  = Core.isdefined(%4, :contents)::Bool
└──       goto #5 if not %5
4 ─       goto #6
5 ─       Core.NewvarNode(:(fib@_3))
└──       fib@_3
6 ┄ %10 = Core.getfield(%4, :contents)::Any
│   %11 = (n - 1)::Int64
│   %12 = (%10)(%11)::Any
│   %13 = Core.getfield(#self#, :fib)::Core.Box
│   %14 = Core.isdefined(%13, :contents)::Bool
└──       goto #8 if not %14
7 ─       goto #9
8 ─       Core.NewvarNode(:(fib@_4))
└──       fib@_4
9 ┄ %19 = Core.getfield(%13, :contents)::Any
│   %20 = (n - 2)::Int64
│   %21 = (%19)(%20)::Any
│   %22 = (%12 + %21)::Any
└──       return %22


I think this could be a variation of the problem in this recent thread

But I am too tired to see how the connection is exactly.

I don’t see that.

What I see is the fib closure checking the :fib field of its underlying struct, but the compiler doesn’t know fib only calls itself so it holds a Core.Box instance with mystery :contents (we know it’s fib).

If that didn’t make sense, then this needs more explanation. In the global scope, a function block makes or adds methods (NOT mutation) to a const-named function with a singleton immutable struct type (singleton means only 1 instance). When such functions call each other, even recursively, they don’t need to (and can’t) store references for each other’s names. If any global name used by a function is const or annotated with a concrete type, the compiler can infer the assigned instance and optimize its code well in the function call. Otherwise, the compiler makes the function call check the global name and figure out what to do at runtime. The former performant case relies on global names persisting, and the latter case relies on the global scope persisting uniquely, so that’s why they do.

Local names however do not outlive their local scope unless a closure captures them, and captured variables still only last as long as the closures do; moreover, function scopes can be repeated. To implement this, closures do have to store or reference the assigned instances. Now we need to talk about how closures (locally defined functions) are implemented; here’s a link to an approximation of how so it’s easier to follow along. The closure’s type is lifted to the global scope and has a field for every local name the closure captures; this is good for compilation and performance because closure variants capturing different variables in a loop can share the same type. The method body is also lifted to the global scope as a function block for the type; note that you can also do this for any const-name function like (::typeof(foo))(x) = 0, but the const name looks much better. The function block in the local scope would essentially be replaced by a type constructor call on the captured variables. In most cases, if a captured variable is never reassigned, the lowerer will give the field a type parameter inferred from the input instance; if it is reassigned, then it’s a Core.Box that can contain anything. So, the typical advice for good type inference is to never reassign captured variables. That is, except when you get recursive: the type constructor call for fib needs to take in fib…but the fib instance doesn’t exist yet so you get a Core.Box placeholder.

There doesn’t really seem like an easy fix to this. You can’t have a circular inlined immutable struct instance, so you need some sort of reference, like giving Core.Box a type parameter for its contents. That would help type inference in many cases, but you can’t possibly write out circular parameters either, like Fib{Fib{Fib{Fib.... Best you can do is Fib{Fib}, the parameter being an abstract iterated union type so that doesn’t really help the compiler. The only real option is to work in the global scope where you can have fully featured functions. Another thing you can’t do with functions in local scopes: conditionally define methods.

5 Likes

See also this GitHub issue:

Locally-Scoped Named Functions have Surprising Behavior that Causes Poor Performance · Issue #47760 · JuliaLang/julia (github.com)

1 Like

Thanks for the detailed explanation. :slightly_smiling_face:
It was very helpful. I had in back of my mind assumed that locally defined functions would be const, but realized no that they are not.

As pointed out in this comment, the gating item seems to be finding acceptable rules for local const.

If such rules could be decided, then we could impose that the locally-declared named function behaves as a const—and then when its definition gets hoisted to global scope, instead of needing to capture itself it could simply refer to itself.

For example: If I say

let; fib(n) = n ≤ 1 ? n : fib(n-1)+fib(n-2)  end

a global struct is defined with a name like var"#fib#3". Instances of this struct are callable, just like any functor. Its definition is approximately this:

struct var"#fib#3"
    fib :: Core.Box
end
(fib::var"#fib#3")(n) = n ≤ 1 ? n : fib.fib.contents(n-1)+fib.fib.contents(n-2)

Within the local scope, an instance of this struct is constructed and incompletely initialized, and then its fib field is set to reference itself. It’s approximately this:

let fib=var"#fib#3"(Core.Box());  fib.fib.contents=fib  end

The only reason a closure (and the attendant type-unstable Core.Box) is necessary is because, at present, it’s possible that the locally-defined identifier fib will be reassigned during the lifetime of the constructed functor. If we knew that it would never be reassigned by deciding that it was a const, then the definition of the global struct could be changed to reference itself, since it would be known that the local identifier fib would never refer to anything other than the functor instance itself. Like this:

struct var"#fib#3"
end
(fib::var"#fib#3")(n) = n ≤ 1 ? n : fib(n-1)+fib(n-2)

Hmm…

Akshually now that I think about it, even without making the function a local const, it’s knowable that fib is never changed during the functor’s lifetime (after initialization) since it’s never syntactically reassigned after or inside the function definition … so this problem is actually totally avoidable, lol. We could eliminate the capture and the box now, without deciding anything about local const nor changing any semantics (and without using the var"#self#" hack described in that issue).

@StefanKarpinski it seems like this could be a breakthrough on the topic of locally-defined recursive named functions.

That’s a clever trick. Here’s how it can be automated:

using ExprTools
struct F{Name} end

macro stable(fdef)
    d = splitdef(fdef)
    name = d[:name]    
    gname = gensym(name)
    f = :($F{$(QuoteNode(gname))})
    d[:name] = :(($name::$f))
    esc(quote
            $name = $f()
            $(combinedef(d))
            $name
        end)
end

Now compare:

julia> let
           @stable fib(n) = n ≤ 1 ? n : fib(n-1)+fib(n-2)
           @btime $fib(20)
       end
  29.025 ÎĽs (0 allocations: 0 bytes)
6765

julia> let
           fib(n) = n ≤ 1 ? n : fib(n-1)+fib(n-2)
           @btime $fib(20)
       end
  363.609 ÎĽs (20 allocations: 320 bytes)
6765
1 Like

Those do introduce a problem of unintended reassignment (though it’s already good practice to be aware of local variables in a scope so they don’t clash), but implementing const for local variables to guarantee they won’t be reassigned won’t improve this situation because local variables still don’t persist, which makes closures implemented the way they are with all the limitations.

That’s not why. fib isn’t reassigned, and in typical cases, unreassigned local variables will be lowered to a type parameter instead of Core.Box, like fib::T. However, that doesn’t happen if the captured variable was not assigned by the time the lowered constructor call occurs

julia> let
       a() = 1 # only assigned here
       f() = a() # typeof(f)(a) instantiation
       end
(::var"#f#15"{var"#a#14"}) (generic function with 1 method)

julia> let
       f() = a() # typeof(f)(a) instantiation
       a() = 1 # only assigned here
       f
       end
(::var"#f#16") (generic function with 1 method)

The @code_warntype reports on the calls of the returned closures show the stability and instability as you’d expect. The fib example is worse because fib couldn’t possibly be assigned prior to its own closure’s instantiation. A hypothetical workaround by this logic is to assign something first, whether by making another method or just the function declaration, but that is treated like multiple assignments even if it’s not:

julia> let
       a() = 1 # only assigned here
       a(x) = 2 # no reassignment, obviously unrelated method
       f() = a()
       end
(::var"#f#30") (generic function with 1 method)

It’s not because the lowerer is confused by the assignment form of method definitions, this still occurs with function blocks.

I think this lowering should be possible. Method definitions, even assignment forms, are recognizably not assignments in either the source or the expression (first args element is a :call expression), so if it can recognize the local variable is in fact not reassigned and is assigned to a function, it can do this instead of making an impossible instance with a circular field in its type var"#fib#3{var"#fib#3"{var"#fib#3"{var"#fib#3".... Then the lowered type constructor call wouldn’t need its yet-to-exist instance as an input.

I think this should also work when you involve another captured variable that isn’t assigned to a function and thus needs to be stored in a field. Say fib checks n <= a for no good reason where let a = 1. Then it should lower to an approximation of:

struct var"#fib#3"{T}
  a::T # we know T == Int but the lowerer doesn't
end
(fib::var"#fib#3")(n) = n ≤ fib.a ? n : fib(n-1)+fib(n-2)

let a = 1
  fib = var"#fib#3"(a)
end
1 Like

Yup! My proposal here is simply to recognize that there are no syntactical assignments to fib inside nor after its definition(s) nor inside the definition of any other function, and therefore it doesn’t need to capture itself. Within its functor’s definition it can simply refer directly to itself. It should work for closures that capture variables from their environment too.

This is actually a different clever trick from what I suggested; what you wrote causes the function to be declared as a singleton object, which has interesting differences in how it captures variables from its environment, but also it doesn’t work inside function definitions because those often need to be closures (non-singleton). For example:

julia> genfib(a) = @stable fib(n) = n ≤ a ? n : fib(n-1)+fib(n-2)
ERROR: syntax: Global method definition around REPL[7]:9 needs to be placed at the top level, or use "eval".

My proposal here should work with recursive closures defined inside functions, not just locally-scoped within let statements.

However, I don’t currently see a way to make my proposal work with two mutually recursive closures. It might be the case that efficient locally-declared mutually recursive functions can only exist in let statements using something similar to your trick, and not be defined inside a function?

Is the problem that transforming mutually accessible local names to callable variables means there’s no mutually accessible global names to reference, like this:

struct Iseven end; struct Isodd end;
(is_even::Iseven)(n::Int) = n == 0 ? true : is_odd(n - 1)
(is_odd::Isodd)(n::Int) = n == 0 ? false : is_even(n - 1)

Iseven()(3) # UndefVarError: `is_odd` not defined

Then neither approach so far would work, the macro also does this:

    f = :($F{$(QuoteNode(gname))})
    d[:name] = :(($name::$f))

except every @stable function is transformed to a subtype of the iterated union F.

Yes exactly, unless they are singletons (where the callable’s name is not needed since it can be instantiated from the type at anytime).

As currently implemented true, but with a rewrite the macro could take multiple mutually recursive function declarations together and define them so they access each other’s global-scoped singleton.

Now that I think about this further, there might be a way which isn’t too inefficient (it retains type stability). The closures can capture each other, with one closure capturing the other in a mutable container and the other not:

julia> struct IsEven{T}  isodd::Base.RefValue{T}  end
       struct IsOdd      iseven::IsEven{IsOdd}    end
       (iseven::IsEven)(n::Int) = n == 0 ? true  : iseven.isodd[](n-1)
       (isodd ::IsOdd )(n::Int) = n == 0 ? false : isodd.iseven(n-1)

       iseven = IsEven(Ref{IsOdd}())
       isodd = IsOdd(iseven)
       iseven.isodd[] = isodd
IsOdd(IsEven{IsOdd}(Base.RefValue{IsOdd}(IsOdd(#= circular reference @-3 =#))))

julia> iseven.((1,2,3,4))
(false, true, false, true)

julia> isodd.((1,2,3,4))
(true, false, true, false)

Note that I needed to parameterize IsEven because the IsOdd type wasn’t defined yet.

Comparing this against global singletons:

julia> using BenchmarkTools
       struct F{T} end
       (::F{:iseven})(n::Int) = n == 0 ? true : F{:isodd}()(n-1)
       (::F{:isodd})(n::Int) = n == 0 ? false : F{:iseven}()(n-1)

       @btime $iseven(20)
       @btime F{:iseven}()(20);
  56.504 ns (0 allocations: 0 bytes)
  22.846 ns (0 allocations: 0 bytes)

Comparing this against how local functions currently capture each other:

julia> unstable_iseven, unstable_isodd = let
           iseven(n::Int) = n == 0 ? true : isodd(n-1)
           isodd(n::Int) = n == 0 ? false : iseven(n-1)
           iseven, isodd
       end
       @btime $unstable_iseven(20);
  185.325 ns (0 allocations: 0 bytes)

My type-stable implementation above gets me thinking… If we just had a way to express circular type parameters, then we could make them type-stable… Right?

1 Like

Essentially replacing Core.Box with a typed reference, which is what many want to be able to happen but the lowerer can’t know what the compiler can. This looks feasible to lower to for yet-to-instantiate inputs assigned to unreassigned variables, though if it were easy to edit the lowerer, we’d already have the low-hanging fruit like inferring pre-closures reassignments or reassignments with right-hand expressions only using outer scope variables and literals. If performance is the goal then it’d be better to do simpler global functions and avoid the Ref allocation; you can make a whole lot of closures in a loop.

Now that you point out that functions’ types are global and accessible, my nonworking example could’ve replaced isodd with Isodd(), and it doesn’t generate an unwarranted supertype F, though we probably should be subtyping Function for consistency. This could also be applied to the recursive fib example for consistency, but I have concerns about instantiation causing runtime overhead, which isn’t needed when the input callable is already there. I also took the liberty of checking if mutual recursion also plays well with capturing other variables:

struct Iseven{T,S}<:Function a::T; b::S end; struct Isodd{T,S}<:Function a::T; b::S end;
(is_even::Iseven)(n::Int) = n == is_even.a ? true : Isodd(is_even.a, is_even.b)(n - 1)
(is_odd::Isodd)(n::Int) = n == is_odd.b ? false : Iseven(is_odd.a, is_odd.b)(n - 1)

let a = 0, b = 0
   is_even = Iseven(a, b)
   is_odd = Isodd(a, b)
   is_even(8), is_odd(8)
end

Neat. But this and the other approaches make it apparent that mutually recursive methods require the lowerer (or macro) to work over the entire local scope, like to figure out the later is_odd is a locally defined function to be lifted unlike the other variables captured by is_even.

Like if we could write Fib{Int, Base.RefValue{#=head=#}}? I’m not certain but I think this should be inferrable. I dislike the idea but the uninstantiable struct X next::X end already exists so maybe this would just be another “user beware” feature.

It’s true that a lot of low hanging fruit could be picked by now if editing the lowerer was easier, but I wouldn’t be so certain. Sometimes people (not you) rather spend their time telling others that things can’t be done instead of trying to actually do them, and if they just spent that time doing them they’d be done; sometimes we need to show people what’s possible.

Performance is always a goal, this is Julia :wink:

Agree for singletons. For mutually-recursive closures, however, I wanted to see if it’s possible to achieve the same performance. Turns out no, but we can get close. I haven’t yet imagined a way for mutually recursive closures to reach the performance of global singletons, but achieving type stability is a good start.

Is the juice worth the squeeze, considering how niche this is? I leave this question open. It’s only on my radar because of this comment.

It does. This is how it determines, for example, whether or not to box a local when capturing it.

What the lowerer cannot do is know the return types of functions (for now). For example, it doesn’t know that /(1,2) has a return type of Float64, so when boxing a capture a=1/2 it doesn’t know to put it in a Box{Float64} and instead must use a Box{Any}. (supposing Core.Box is type-parameterizable.)

So in the case of mutually recursive captures, it looks possible. How difficult it is and whether it’s worth it is the question.

Yes exactly. I’ve run into this problem before (I couldn’t achieve type stability because of a flexibly-parameterized object that needed to self-reference). I gave up and can’t remember the details though; maybe I could’ve solved it some other way.

On the one hand, “mutually-recursive named functions in local scope which are maximally performant” isn’t really an urgent addition to the language.

But it surfaces what looks like a basic flaw in how Julia implements local scope, one which shows up in a number of places: the bad performance for closures which capture values which might be shared, the strange/wrong consequences of conditional method definitions, and the mandatory boxing of self-calls to a function, for which mutual-recursion is just a special case insofar as it doesn’t admit certain workarounds available for self-recursion.

To be clear, I don’t think there’s anything wrong with the (intended) semantics of Julia scope, just the implementation of them.

I’m still working on figuring out how that implementation works, so take anything I say in that spirit, since it could just be my ignorance talking. But I wonder if the answer might start with looking into Lua upvalues, as an example of a local scope implementation which doesn’t suffer these issues. This would need to be adapted, given that functions are a composite type which may have any number of methods, but I suspect the approach is amenable to accommodating that.

It’s not uncommon that the clearest illustration of such an issue is somewhat contrived, a description which does apply to local mutually-recursive functions. That doesn’t mean that a fix for the issue won’t have positive effects on normal code, in particular, removing the anonymous function penalty would be of great benefit.