Function definition inside let-block impacting performance?

As Mason showed, macros can perform something like lowering (not exactly, we certainly don’t write so many GOTOs). If a macro over a local scope can be proved to apply all the time, then it’s something to put on a backburner for when the lowerer can be improved.

Type inference is the job of the compiler, and by the language’s design, the compiler only knows types from a call signature. The lowerer at the method definition does not have access to a call signature, so it can never infer types like the compiler does later.

Not true, if the lowerer can see that a was only assigned once prior and never again, then it’ll lift the closure’s type to include a type parameter a::T. When the compiler comes in, it’ll compute 0.5 from the literals and instantiate the closure with T == Float64. No heap-allocated box needed.

I’d say it is, but if we take a step back and look, we’re essentially lifting the method and giving it 1 type for efficiency in the global scope, so if we don’t really need to capture any variables from a local scope like the fib example, we could just do all of this in the global scope. If we just don’t want to expose it, then we could use weird variable names and in an upcoming version just not declare it public.

This part is just a genuine difficulty of inferring captured variables that get reassigned and general impossibility of narrowing down the type. Other languages deal with this by declaring the type of the captured variable, even generically; this also works in Julia, though the value is still heap-allocated for memory safety. You can take the allocation out of the compiler’s hands by implementing a shared value as a mutable value rather than a reassigned variable, and you can pass such a value into many scopes. The only possible inference improvements are for very limited yet useful treatment of captured variables, like inferring all reassignments before closures or inferring lines in yet-to-be-called closures that only involve the captured variables. The other usage limitations are there because of mechanisms for good performance, and it’s more likely they’ll get better warnings and errors; there’s a reason that languages with more dynamic features are slow.

Doesn’t seem true, an optimization of Lua upvalues (which refers to the variable, not an instance) can leverage their immutability (not being reassigned), which happens in Julia already. If the variables are reassigned, then the upvalue must be allocated, which is Core.Box in Julia. I don’t know if any implementations of Lua or its derivatives, particularly LuaJIT, are any better at inferring the undeclared type of a captured and reassigned variable.

1 Like

These issues are loosely related, but I don’t see them stemming from the same problem or having the same solution.

  1. Closure poor performance with shared captures:
    Boxing (heap allocation) is necessary for values that could get reassigned during the closure’s lifetime, since the semantic is that a captured variable that changes value must reflect this everywhere that it can be seen. Most of the performance loss we see is actually due to type instability from not type-parameterizing the box, which happens because lowering doesn’t have access to function return types.
    Solutions: type-annotate boxed captures or use the let block trick, and either rewrite lowering in Julia so it accesses return types or have lowering emit Julia code which does so.
  2. Conditional function definition weirdness:
    Two versions of local foo() defined in two branches of a conditional could easily become instances of two structs (e.g. var"#foo#3" and var"#foo#4") declared at global scope, and the local variable foo would simply be assigned to one or the other instance. However, Julia’s multiple dispatch paradigm affords us the ability to declare additional methods of foo, and if those additional methods were declared in yet other conditionals … you see the issue.
    Solution: resolve the local const issue, and then treat local functions as local constants (and allow conditionally-added methods only if they are added within the same branch as the initial function declaration).
  3. Mandatory boxing of local function self-calls:
    As I’ve shown, having the function capture itself at all is totally unnecessary if the function’s local identifier is never reassigned (i.e. almost always). It happens as a consequence of the rule that when a local function captures a local variable, if that variable has any syntactical assignment after the function’s instantiation, then it should be boxed (this is a good rule). However, even though the first assignment of the function’s local identifier technically happens after the object’s instantiation, it’s not assigned to any value that isn’t known to the function.
    Solution: carve out an exemption for the first assignment of the identifier which represents the local function itself.
1 Like

Sure, but in this context I was referring specifically to a boxed capture.

In order for a box to be type-stable, lowering must type-parameterize the box—therefore it needs to know all possible types that a could have during the closure’s lifetime, and therefore it needs the return types of all function calls which a is assigned to.

For example:

julia> f, g = let a = 0.5
           _f() = (a = a * 2; a)
           _g() = (a = a / 2; a)
           _f, _g
       end
(var"#_f#3"(Core.Box(0.5)), var"#_g#4"(Core.Box(0.5)))

Lowering can see all three syntactical assignments to a, although it has no idea how often the latter two may occur. If it knew that the return types of *(::Float64, ::Int) and /(::Float64, ::Int) are Float64 so the type of a is always Float64, it would be possible for lowering to type-parameterize a’s Box with Float64.

If lowering were rewritten in Julia, it would have access to this information through calls to Core.Compiler.return_type and typejoin, but since it’s still in Lisp my understanding is that it can’t. (I’ve previously shared ideas for it to emit Julia that will perform this function; maybe I should write a macro implementing those ideas instead of trying to convince people.)

Self-recursive closures, where their identifier is never reassigned, are as efficient as singletons since they don’t need to capture themselves; they can simply refer to themselves. It’s only for mutually recursive functions, which must otherwise capture each other, where it’s very attractive to check if they’re singleton.

True, but (sidenote) it can be improved by type-parameterizing Core.Box. Currently, declaring the type gets lowered to type-assertions whenever the boxed value is get or set, but I have shown that this is less efficient than type-parameterizing it.


Maybe we should take stock of the ideas we’ve discussed so far, as we’ve begun talking past each other a bit due to me floating several ideas simultaneously.

Idea 1: Locally Defined Singletons

If a locally defined function has no captures, then lowering can tell that it is singleton. (It doesn’t need to capture itself if its local identifier is never reassigned.) All references to its local identifier can be replaced with references to its global singleton.

Example:

# user code:
let;  fib(n) = n ≤ 1 ? n : fib(n-1)+fib(n-2)  end

# currently lowers to approximately:
struct var"#fib#3"  fib::Core.Box  end
(fib::var"#fib#3")(n) = n ≤ 1 ? n : fib.fib.contents(n-1)+fib.fib.contents(n-2)

# would instead lower to approximately:
struct var"#fib#3" end
(::var"#fib#3")(n) = n ≤ 1 ? n : var"#fib#3"()(n-1)+var"#fib#3"()(n-2)

Note that this idea helps for self-recursive singletons and mutually-recursive singletons, but not for self-recursive closures nor mutually-recursive closures.

Idea 2: Self-recursive closures

If a locally defined function references itself, and if its identifier is never reassigned, we can lower its function definition to refer to itself, and we can lower its struct definition not to capture itself.

Example:

# user code:
let;  fib(n) = n ≤ 1 ? n : fib(n-1)+fib(n-2)  end

# would instead lower to approximately:
struct var"#fib#3" end
(fib::var"#fib#3")(n) = n ≤ 1 ? n : fib(n-1)+fib(n-2)

Note that this idea helps for self-recursive singletons and and self-recursive closures, but not for mutually-recursive singletons nor mutually-recursive closures.

Idea 3: Mutually-recursive closures

If two (or more) locally defined closures are mutually recursive, then they must capture each other in a type-stable manner.

I showed how this could be type-stable and reasonably performant, for example:

# user code:
let
    iseven(n) = n == 0 ? true  : isodd(n-1)
    isodd(n)  = n == 0 ? false : iseven(n-1)
end

# would lower to approximately:
struct var"#iseven#4"{T}  isodd::Box{T}  end
struct var"#isodd#5"      iseven::var"#iseven#4"{var"#isodd#5"}  end
(iseven::var"#iseven#4")(n) = n == 0 ? true  : iseven.isodd.contents(n-1)
(isodd::var"#isodd#5")(n)   = n == 0 ? false : isodd.iseven(n-1)

However, now that I think about it again, it only works for functions that could have been singleton—functions that capture nothing but each other. For example, if isodd had type parameters because it captured additional variables, it wouldn’t be possible for iseven’s type parameter to describe it.

So it looks like we run into a hard limit: it is currently impossible to make performant mutually recursive closures. I have opened an issue to probe if it’s possible to make circularly-parameterized types to address this.

With more creative ordering, we can make closures capture each other alongside other values:

Example:

struct IsEven{X,IO}  
    x::X
    isodd::IO
end
mutable struct IsOdd{X,Y}
    const y::Y
    iseven::IsEven{X,IsOdd{X,Y}} 
    IsOdd{X,Y}(x) where {X,Y} = new{X,Y}(x)
end
(iseven::IsEven)(n::Int) = n == iseven.x ? true  : iseven.isodd(n-1)
(isodd ::IsOdd )(n::Int) = n == isodd.y  ? false : isodd.iseven(n-1)

x, y = 0, 0
X, Y = typeof.((x, y))
isodd = IsOdd{X,Y}(0)
iseven = IsEven(0, isodd)
isodd.iseven = iseven

performance:

julia> @btime $iseven(20) 
  59.063 ns (0 allocations: 0 bytes)
true

This is … different … because the closures aren’t actually capturing each other, instead they are continually reinstantiating each other from their own internal state. If it is decided that all the mutually-recursive closures must have the same fields, then I guess this is okay.

Compared to having the mutually-recursive closures capture each other, having them re-instantiate each other is faster:

julia> @btime $is_even(20)  # Benny's idea
  20.441 ns (0 allocations: 0 bytes)
true

For the sake of memory, perhaps it’s best for mutually-recursive closures to make a single struct which captures all mutually-recursive closures’ fields, and to have all the functions capture that one struct.

struct IsevenIsoddStorage{A,B}  a::A; b::B  end
struct Iseven{S}<:Function;  store::S  end
struct Isodd{S}<:Function;   store::S  end
(is_even::Iseven)(n::Int) = n == is_even.store.a ? true :  Isodd(is_even.store)(n - 1)
(is_odd::Isodd)(n::Int)   = n == is_odd.store.b  ? false : Iseven(is_odd.store)(n - 1)

store = IsevenIsoddStorage(0,0)
is_even = Iseven(store)
is_odd = Isodd(store)
is_even(8), is_odd(8)

Note that this idea would help for mutually-recursive closures and mutually-recursive singletons.

1 Like

Issue opened for idea #2:

Do not capture recursive local function’s self-identifier (to avoid box) · Issue #53295 · JuliaLang/julia (github.com)

IMO it would be more confusing to make 2 different functions with different types share a name, which isn’t possible in the global scope, so this should just be disallowed with a syntax error about conditionally defined methods. Local const wouldn’t help because adding methods is not a reassignment.

In general, that is impossible, or rather a must have Any type. The closure is only a callable, and it can be involved in infinitely many call signatures with different inferred types, within which a can be assigned something that depends on the call signature. The f, g = let a = 0.5 is a useful special case where a assignments do not have the closures’ own variables in the right hand expressions, so it’s possible to infer the type of a without involving the closure’s infinite call signatures. By examining the whole scope, this should be detectable.

Yes, declaring a type only restores type inference of code that uses the box’s content via typeasserts of the content’s type. However, the box itself is untyped so the access itself is inefficient.

I think the ideas here can also be applied to closures that call other closures. Mutual recursion is just the motivating special case.

Capturing is what the closure’s scope does to an outer scope’s local variables. That’s language level, not implementation level. Instances from the outer scope are contained because they need to be able to persist after the outer scope finishes; if something can be stored in the global scope to replicate these instances, we don’t need to contain them to implement captured variables.

Now that you already demonstrated that allocating a box or a mutable struct introduces GC pressure and dereferencing overhead, I should point out the downsides of the “each closure contains all variables captured by callee closures” approach. The more obvious one is that the closures’ instances are containing redundant information and thus take up more space; the separate storage type won’t help this unless it’s heap-allocated and referenced. This of course does not persist if you only let one closure escape the outer scope. The less obvious disadvantage is that the lowerer/macro needs to identify caller-callee trees and cycles (recursions) of the set of closures to determine sets of captured variables for the closures’ types; I’m calling this a disadvantage for now because I can’t tell if it’s hard or easy. The “delayed typed containment” approach does not need to do this, it just makes a mutable field whenever a captured variable had not been assigned yet and contains it after it is. It may be preferable to simplify the lowering by letting assigned-once-to-closure variables be contained normally like in the “delayed typed containment” approach, which if editing my earlier example:

struct Iseven{T,S}<:Function a::T; b::S end
struct Isodd{R}<:Function is_even::R end;
(is_even::Iseven)(n::Int) = n == is_even.a ? true : Isodd(is_even)(n - 1)
(is_odd::Isodd)(n::Int) = n == is_odd.is_even.b ? false : is_odd.is_even(n - 1)

let a = 0, b = 0
   is_even = Iseven(a, b)
   is_odd = Isodd(is_even)
   is_even(8), is_odd(8)
end

As this example shows, we are also assuming variable assignments occur in a known order within the scope. I have concerns whether this is a fair assumption. For example, the example above would fail if a hadn’t been assigned yet, possibly conditionally, and the types would change if the definitions of is_even and is_odd were switched (could that be done conditionally). That isn’t a problem for the global scope because variable and scope persistence means function instances don’t need to store anything before compilation. It’s possible we may need to accept this as a limitation of the local scope and make do with closures that are lifted to the global scope.

This already happens whenever you conditionally assign an anonymous function to a single identifier:

if cond;  f=x->x  else  f=x->x^2  end  

the idea here is just to extend this to local named functions, and to use similar semantics that we use for local constants, if we can find agreeable rules for that.

The difference when you conditionally define a named function at global scope to be one thing or another is that there aren’t two types under the hood; only one version is compiled. But this is under the hood anyway; most users don’t notice.

The idea here is to treat local named functions as local consts with regard to how their initial declarations should interact with conditionals, and add some minor additional rules to handle how methods can be added (i.e. only allow methods to be added within the same branch where the function is declared).

For the general case, I agree. For example, a must certainly be in a Box{Any} here:

f, g = let a = 0.5
    _f(x) = (a = a * x; a)
    _g(x) = (a = a / x; a)
    _f, _g
end

This is discussed in the first “difficulty” of idea 3 here. We can annotate the types of either x or a to fix this, which of course limits the usefulness of lowering to use type inference here.

However, there are many “special cases” where closures either mutate a capture independent of their arguments, or capture something which is mutated in their parent scope—if lowering had access to return types it could correctly parameterize the box. Here’s a tricky example of this.

Ah, good point. Even if Iseven and Isodd access a shared storage element, if I construct an array of Isevens and an array of Isodds this information will be copied twice.

If I’m not mistaken, this suffers from the same issue.

Nice, this trick looks a lot like a y-combinator:

fix(f) = x -> f(fix(f))(x) # The y-combinator

let
    fiborec(fib) = n -> n ≤ 1 ? n : fib(n-1)+fib(n-2)
    @btime $(fix(fiborec))(20)
end

which seems to also run faster than the direct definition.

1 Like

Correct, a y-combinator runs faster than the closure capturing itself through a type-unstable box. However it is slower than simply having the functor directly reference itself.

julia> using BenchmarkTools
       fib1(n) = n≤1 ? n : fib1(n-1)+fib1(n-2)                 # global defn
       fib2 = let;  fib(n) = n≤1 ? n : fib(n-1)+fib(n-2)  end  # local defn
       struct var"#fib#4" <: Function end
       (fib::var"#fib#4")(n) = n≤1 ? n : fib(n-1)+fib(n-2)
       fib3 = var"#fib#4"()                                    # direct self-referencing
       fix(f) = x -> f(fix(f))(x)
       fib4 = fix(fib -> n -> n ≤ 1 ? n : fib(n-1)+fib(n-2))   # y-combinator

       @btime fib1(10)
       @btime $fib2(10)
       @btime $fib3(10)
       @btime $fib4(10)
       fib1(10) == fib2(10) == fib3(10) == fib4(10)
  240.831 ns (0 allocations: 0 bytes)
  4.029 μs (0 allocations: 0 bytes)
  208.288 ns (0 allocations: 0 bytes)
  439.899 ns (0 allocations: 0 bytes)
true
2 Likes

But that’s the crucial difference of not incorporating the name into the function syntax, that’s unambiguously 2 different functions in any scope. The confusing part for named functions is that it would be 2 possible methods of 1 function in the global scope but 2 different functions in the local scope. You can’t really sweep this “under the hood”. Not only will it affect type inference (say you are repeatedly returning either closure to store into a vector, expecting them all to share 1 function type because you named them the same), typeof exposes this to the user on the language level. Many things about closures are implementation details, but 1 function name-1 type is language semantics.

Yes, the only way to save space is distant allocation and storing references to it. The mentioned “simplification” wouldn’t save space either, but it’d make the type have fewer type parameters.

I’ll edit this comment too but it occurs to me that this can’t be the case because when a local function has multiple methods that each capture different variables, its implemented type must contain instances for all of the variables, so the lowerer must derive the type from the scope as a whole, not each method definition. The current algorithm is indeed simpler than what I’m suggesting, though, where functions are stored as their captured variables and, if going the “parameter-simplified” route, some fields are collapsed into functions stored as themselves (Isodd doesn’t store b directly because Iseven already does for it).

1 Like

Ah, I get you now. Yes, returning a locally-declared named function which could actually have multiple types would indeed be confusing.

Maybe best to stick to the current policy, which simply disallows the local named function from being redefined, irrespective of whatever semantic is chosen for local consts.

It would seem that some number of these issues could be resolved with a syntax for declaring a function of a single method. That would be amenable to definition on conditional branches, it would throw an error unless only one of those branches was taken. The compiler could still specialize on inputs, but it would know that the function itself corresponds to a single prototype, with a single name. That should be enough to prevent having to box the call.

The method dispatch is Julia’s best feature, but that doesn’t mean it’s always desirable. It’s an affordance that we have to pay for whether we’re using it or not.

What Lua does correctly, every time, is determine whether or not an upvalue is shared once a closure escapes a local context. Reassignments within that local scope don’t matter, because the locals are on the stack. But in the common case where a returned closure is the only entity which remains able to mutate a variable, there’s no reason for that variable to stay boxed, it belongs to the closure, nothing else can touch it.

I don’t think that’s what Julia is doing, but I might be wrong about that. I understand the Lua mechanism in detail and the Julia one poorly.

We already have four function declaration syntaxes:

# anonymous: 
x->x
function(x) x end
# named: 
f(x)=x
function f(x) x end

Either of the first two should suffice.

As I’ve previously expressed, a recursive local function boxing itself is unnecessary and unrelated to it having multiple methods, and this problem can be solved with no change in semantic behavior. (note that multiple method declarations already get lowered to a single assignment anyway.)

Even if the closure is the only thing mutating and observing its captured variable, whatever data structure holds it must be mutable. In other words, a box is needed.

What Julia gets wrong is not inferring and parameterizing the box type.

A Box is two indirections. Only one is necessary.

Those two do not declare a function which can only have a single method, because nothing in Julia does that.

julia> x = (y::Number) -> y
#3 (generic function with 1 method)

julia> (::typeof(x))(y,z) = y + z

julia> x
#3 (generic function with 2 methods)

The difference between a function which must have only one method, and a function which just happens to have one method, is that the compiler can infer a great deal more about the former than the latter. It can be coded as a jump to a targeted region in memory, the return type of the function will never change, and so on. For example, there is no reason to box any reference to such a function.

This doesn’t seem correct from what I’ve read about the reference implementation. As you might expect, boolean, nil , and number are stored directly and copied around because of their fixed sizes; the other builtin types (you can’t define custom types) are heap-allocated and handled by the GC, and variables just reference them (references have fixed size). Local variables do indeed live on the block’s stack and die with the stack (unlike many languages, the only default-local variables are function arguments, you must declare local to avoid global variables, by design), but that doesn’t mean the underlying data does. Upvalues (specifically the concept introduced by the reference implementation of Lua 5.0, see section “5 Functions and Closures”) are also a language-level entity that doesn’t directly correspond to the data; the figures make it clear that the closure and upvalue don’t live on the stack, and when the stack unwinds neither does the captured variable. This is similar to boxing local variables in Julia and carries the same performance pitfalls like GC pressure and dereferencing overheads. This is a general necessity because when you semantically reassign that captured variable (and possibly to an instance of any type), it’s efficient to change the data in one place and have many closures reference it. The only reason you can see a performance degradation in Julia is because the lowerer can optimize assigned-once captured variables (as far as it can tell, anyway) without boxing, just storing the instance in each closure.

This is also incorrect. Dispatch, inference, and compilation occur over the callable (the function) and the arguments, and it doesn’t matter how many methods the callable’s type has. 1 method can be compiled for infinite call signatures with infinitely many possible return types, and any of the call signatures can be type-unstable. The only thing a unimethod could do in Julia is error when you attempt to add another method signature, and it won’t address any issue mentioned here. Like uniment points out, we already have anonymous functions we can assign to variables the usual way, and adding methods to those is too difficult to be accidental. Named function syntax is just special cased to add methods to an existing function type, and IMO exceptions should not be introduced.

1 Like

I hadn’t looked at Lua before, but I like the idea of their tables.

Do you know if any serious consideration has gone into making [a=1, b=2] lower into an ordered dictionary? This way, just as (1, 2) is a tuple and (a=1, b=2) is a named tuple, [1, 2] is a vector and [a=1, b=2] would be an ordered dictionary (analogous to Lua tables {a=1, b=2} and JavaScript objects {a:1, b:2}).

On github somewhere there’s a thread that discusses [;a=1, b=2] for Dictionaries.Dictionary.

Lua tables don’t seem to be semantically ordered, and tables can be indexed with anything, so it seems more like a Dict{Any, Any}.

Understood. However, ordered dictionaries would have the same relationship to vectors as namedtuples have to tuples, which is good for making the user interface for this syntax sugar consistent. Also, ordered dictionaries seem to be generally a better fit for Julia’s target audience (basically the same as Python’s target audience).

One suggestion that has been made is that ordered dictionaries can be indexed in order by a special ordinal index type (e.g. my_dict[6th]); any other non-token index will be treated as a hashkey.

I’m not finding it, so I’ll spitball some thoughts here:

To initialize:

[;]              # OrderedDict{Any,Any}()
[a=1, b=2]       # OrderedDict{Symbol, Int}([:a,:b], [1,2])
[; a=1, b=2]     # alternative syntax
['a'=1, 'b'=2]   # OrderedDict{Char,Int}(['a','b'], [1,2])
[a=1, 'b'=2]     # OrderedDict{Any,Int}([:a, 'b'], [1,2])
# note that the l.h.s. can be any literal

# multi-dimensional: (???)
[
    (1,1) = 1    (1,2) = 0
    (2,1) = 0    (2,2) = 2
]

# generator:
[; [Symbol("key",i)] = i^2  for i=1:5]

# non-literal l.h.s.:
[[x]=1, [y]=2]   # access variables x and y as keys
[[:a]=1, [:b]=2] # another alternative for [a=1, b=2]

To access:

d = [a=1, 0=2]   # OrderedDict{Any,Int}([:a,0], [1,2]) 
d[:a] == d.a == d[1th] == 1

dt = Tokenizer(d)  # generates tokens from keys
d[dt(:a)] == d.a

d[0] == d[dt(0)] == d[2th] == 2

let t = dt(some_key, 0) # 2nd argument sets default value if key not found?
    d[t] += 1
end

d.:0 = 3 / d.:0

(since getproperty and setproperty! on dictionaries is otherwise pretty useless, I see no reason not to act like a PropDict. This also builds consistency with NamedTuple.)

This is clearly not true. If a_fn(a::UInt8, b::UInt8)::Uint = a + b is declared to have precisely one method, the only valid arguments are primitives, there is one valid return type, and this information is reliable, it can never change. A call to a_fn doesn’t need boxing. What you’re saying (at best) is that such a function might not have those properties. But one where any number of methods can always be added will not have those properties, it cannot have those properties, so the compiler must not assume it will. Even if the user intention was that those properties should hold.