RFC: Some Ideas to Tackle #15276 - performance of captured variables in closures

uniment · February 27, 2023, 10:09am

Naïvely written Julia should be performant. It shouldn’t take memorizing a 35-page manual to reliably beat Matlab. Unless more performance footguns are removed, it’s impossible for a package ecosystem to be performant. For a language whose raison d’etre is performance, it is an existential imperative to do better. (We also need a performance tips cheatsheet that fits on a single page.)

Let’s talk about the infamous performance of captured variables in closures · Issue #15276. This comment decisively settled the debate: the final solution would be to rewrite the lowering stage in Julia, so that lowering could leverage Julia’s type inference system.

However, according to this comment,

There are no plans to port lowering to Julia.

Okay. So we just have to live with performance occasionally getting KO’d, and rely on @code_warntype and JET.jl to avoid yuge performance hits?

Maybe not. (?) Here are four half-baked ideas that have been rattling around my noggin, and I request your assistance to chime in and help work out the feasibility and details:

Idea 1: Don’t box variables that aren’t reassigned in/after closure def’n

Variables are often boxed over-zealously when they don’t need to be. It’s because the rule is too simple: if a capture is ever syntactically reassigned, or if it’s assigned within a conditional, then it’s boxed. But this misses the point of boxing: to cover the case of a variable that could be reassigned during the closure’s lifetime, so that changes are visible at all scopes that capture it (otherwise you’d just use a normal, unboxed, immutable capture). Most of the time, (re)assignment occurs before the closure definition (which is obviously before its lifetime), and assignment after the closure definition simply doesn’t occur but we box it anyway.

Consider the example from Performance Tips:

function abmult(r::Int)
    if r < 0
        r = -r
    end
    f = x -> x * r
    return f
end

In this example, r gets boxed in an untyped mutable Core.Box so accessing it is type-unstable. (Unpopular opinion: It shouldn’t be so easy to inadvertently trash your performance!) According to the manual, because the parser doesn’t have access to type information (emphasis mine),

the parser does not know that r has a fixed type (Int ). nor that r does not change value once the inner function is created (so that the box is unneeded).

The parser does know that the box is unneeded, because r is never syntactically reassigned after f is declared! (there is never an Expr(:(=), :r, ...) in any branch henceforth.) There is simply no way for r’s value to change during f’s lifetime, so the box is a waste and we’re pissing away clock cycles for nothing. This example from the manual is a perfect case study in why the current boxing policy is too aggressive.

It’s a very common pattern—to assign a variable several times near the beginning of a function during some sort of “initialization,” and then capture it but never change it again—many languages have trained us into this pattern, and sometimes it takes a lot of effort to avoid, so it’s a huge performance trap for newbies (and even regulars) to Julia. If lowering just got a little smarter about this, I reckon it’d eliminate about 80% of non-global Core.Boxes.

Idea 2: Type-Parameterize Core.Box

Of course, we can’t avoid boxes forever. But as demonstrated here, once it has been determined that a variable should be boxed, type-parameterizing the box will allow significant performance gains (beyond simply typeasserting each access of an untyped container, which is what currently happens when an identifier is type-annotated).

A type-parameterizable Core.Box could look something like this (adapted from boot.jl line 378):

mutable struct Box{C}
    contents::C
    Box{C}(c) where C = new{C}(c)
    Box(@nospecialize(c)) = new{Any}(c)
    Box{C}() where C = new{C}()
    Box() = new{Any}()
end

Notice that the default type is Any, so it’s backwards-compatible.

Once Box is type-parameterizable, then for any boxed variable with a type annotation (e.g. a::Int), the lowering stage can parameterize the box accordingly.

Maybe we can avoid type-annotation more though:

Idea 3: Type-Inferring Boxes

It’ll be a while until lowering is ever re-written in Julia, but maybe we can just have lowering emit Julia code that will do the job? It could work something like this:

On every expression where a local boxed variable is assigned, keep a record of what functions are called, on what symbols and literals. Construct an expression that will calculate the resulting type when the outer function is called, and insert that into the box’s type parameter.

For example, if it has been determined that a is going to be boxed, then when this assignment is encountered (where x is an argument or global const, and thus is type-stabilized):

a = x + 1

construct an expression like this:

atype1 = :(Core.Compiler.return_type(Tuple{typeof(+), typeof(x), typeof(1)}))

and when seeing an expression like this

a = a - 1.0

construct an expression like this:

atype2 = :(return_type(Tuple{typeof(-), $atype1, typeof(1.0)}))

Construct an expression like this every time a syntactical assignment is encountered.

(if reassignment occurs inside a closure or loop, ideally make an expression that calls a recursive function—the basic gist is to recurse until the typejoined type converges.)

Assemble those expressions into a single typejoin call, and insert that into the box’s type parameter when declaring the boxed variable. Something like this for example:

atypes = [atype1, atype2, atype3]
emit(:( a = Box{typejoin($(atypes...))}() ))

So, for example, the following function declaration:

function foo(x)
    if x > 0; a = x else a = -x end
    () -> (a = a - 1; a)
end

could be lowered to (something resembling):

struct var"#13#14"{A}<:Function
    a::Core.Box{A}
end
(f::var"#13#14")() = (f.a.contents = f.a.contents - 1; f.a.contents)
function foo(x)
    if x > 0; a = x else a = -x end
    a = Box{typejoin(typeof(a), return_type(Tuple{typeof(-), typeof(a), typeof(1)}))}(a)
    var"#13#14"(a)
end

(pretend I’ve written a recursive type inferencing function though.)

Notice that boxing is delayed until right before the closure is declared, so as to minimize the likelihood that the box type will be abstract at the moment it’s captured.

This concept dumps a bunch of type inferring code into the function body, but if we’re lucky it should all constant-fold for zero runtime cost.

Idea 4: Avoid Boxing for Closures which Immediately Execute

If a closure is returned or if it’s passed as an argument to another function, then its lifetime is unknown so it’s appropriate to box its captures. But for closures which execute immediately and aren’t stored, there isn’t any use in boxing. There are times when we can know syntactically that this will occur.

A first example of this is comprehensions. Array comprehensions offer a syntactical assurance that the closure will immediately execute and that it will not be stored. As a result, there is simply no need to box its captures—even if they’re modified before and after the comprehension has comprehended.

A second—albeit less common—example is for functions which are introduced just to provide an inference barrier. Occasionally it’s useful to write a function and immediately execute it, simply to have the compiler create a type-stable body of code within a sea of instability, like this example. In these cases, too, it can be determined from the syntax that the need for boxes is obviated since the closure will be executed immediately and only once, if the closure definition is the first argument of a :call expression.

Inner functions can also be declared and called in a loop, although I suspect that’s even less common.

Thoughts please! Would these ideas work? What would it take? What are the gotchas? Am I making any glaring and egregious errors?

Here are some difficulties I can imagine:

Idea 1:

If a closure captures a variable that is only declared in a conditional branch that doesn’t execute: currently it throws an UndefRefError from attempting to access an uninitialized box, but it’ll now throw an UndefVarError instead.
It might take some thought to consider all the cases where it can be determined a variable is never reassigned after the closure declaration. For example, in this case the closure is declared in the same branch as the variable it encloses; the logic ought to be able to recognize that the branches are mutually exclusive, and therefore not create a box.

Idea 2:

This should be straightforward, and having type-annotations dictate box type-parameterizations should be an easy and immediate win.

Idea 3:

If the closure assigns a value to its capture which depends on an argument without type annotation, then I don’t see a way for its type to be known to the closure’s parent scope; a Box{Any} will be needed.
Similar if the closure assigns a value to its capture which depends on an untyped global.
If the closure assigns a value to its capture which depends on a function, and the function definition changes so that it outputs a new type during the closure’s lifetime, what should happen? Currently this is allowed, because the capture is always boxed and type-unstable. But I think it’s worth breaking; I don’t think accommodating this behavior is worth the type-instability. If changing the return type of a function that influences a closure’s capture types would be permitted during the closure’s lifetime, the user should specify that explicitly with a Ref{Any}.
If the closure assigns a value to its capture which depends on its previous value, then the type-inferencing code needs to be recursive.
If the closure assigns a value to its capture a which depends on another capture b, and then assigns a value to b that depends on a, and if there’s another closure which captures the same variables and performs its own assignments to them, then what?? It’s giving me a headache trying to figure out where to start.
It feels a bit like I’m trying to do too much: Julia already has a bunch of parsing and inference machinery. Can we lean on it more here?

Idea 4:

FastClosures.jl does almost the right thing, although there’s more to say about the proper way to handle closures that make assignments to their captures (it definitely doesn’t do the right thing with operators like +=).

cc @Zach_Christensen

mrufsvold · February 27, 2023, 2:27pm

On topic: Re: Closure assignment using the output of a function should be broken.

I don’t know how I feel about removing options or requiring the use of Ref. I know there have been a few decisions to remove expressivity/ergonomics for the sake of removing performance foot guns, but in general, making the user understand Refs and such doesn’t feel right.

That said, I think this is definitely a case where VSCode should be highlighting performance concerns in your code and providing an explanation of what’s going on. I’m not too shabby at Julia and I still can’t hold all the performance tips in my head. It would be extremely helpful if the tooling unloaded some of that burden.

Off topic:

Here I was about to have a productive week, and now, I’m just going to be reading this thread the whole time.

CameronBieganek · February 27, 2023, 3:05pm

According to the JuliaSyntax.jl README, one of the goals of JuliaSyntax is

Grow to encompass the rest of the compiler frontend: macro expansion, desugaring and other lowering steps.

Perhaps @c42f can confirm.

Zach_Christensen · February 27, 2023, 5:33pm

I’ve been digging around Julia codegen stuff lately. Boxing may be a bit more complicated if we want a consistent approach to memory management in Julia. A lot of that stuff currently seems to be implemented in the easiest way to make user facing stuff work. So asserting new rules for one part of the system in the absence of standardizing other parts may be premature

uniment · February 27, 2023, 9:10pm

To clarify, if we defend the behavior as it is now, this can never be type-stable:

julia> genf(a) = () -> (a += 1)
       f = genf(0)
       f(), f(), f()
(1, 2, 3)

to achieve type-stability, we must manually type-stabilize a with a Ref:

julia> geng(a) = let a=Ref(a); () -> (a[] += 1) end
       g = geng(0)
       g(), g(), g()
(1, 2, 3)

(~~as shown here, simple type-annotation is an inferior solution.~~ Edit: per this proposal, type-annotation will be equivalent)

If we defend the current behavior, what we gain for it is the ability to accommodate a function’s return type changing during the life of a closure whose capture type depends on it:

julia> q() = 0
       genh(a) = () -> (a += q() + 1)
       h = gen(0)
       h(), h(), h()
(1, 2, 3)

julia> q() = 0.0
       h(), h(), h()
(4.0, 5.0, 6.0)

The only reason that this latter code functions, is because the box h.a is type-unstable. If we stabilize it based on knowledge of q’s return type at the time h is instantiated, then if q’s return type changes, h will throw an error.

In such a future where a is stabilized, if you wish for h to accommodate arbitrary changes in q to include changes of its return type, you would have to write this:

genx(a) = let a=Ref{Any}(a); () -> (a[] += q() + 1) end

In other words, currently type-instability is the default, and you have to add a Ref for type-stability. I suggest making type-stability the default instead—considering how much more useful type-stability is than the ability to accommodate arbitrary changes in functions’ return types, this seems like a good trade.

Agree. All roads lead back to a “Julia Standard Linter™.” Even still, sometimes you really have to rip up your code to get type stability.

Yes, this has become apparent without looking at the code

Boxing may be a bit more complicated if we want a consistent approach to memory management in Julia. … asserting new rules for one part of the system in the absence of standardizing other parts may be premature

Can you elaborate your concerns? Is simply type-parameterizing Core.Box harmful somehow?

TheCedarPrince · February 27, 2023, 9:31pm

Just an observer here but I again thank @uniment for your really well put together RFC and am eager for the discussion again here. Your RFCs, even if not fully accepted, always generate interesting discussions that are a treat!

mrufsvold · February 27, 2023, 10:10pm

Seconded! When people ask why I love Julia, I say something like “Eh, I guess it’s fast or something, but you gotta see this community. We’re debating a syntax for parallel chaining ”

Zach_Christensen · February 28, 2023, 1:20am

I have no idea if it would be harmful somehow, because I don’t completely understand the how Core.Box relates to internal representation of boxing (Memory layout of Julia Objects · The Julia Language). It’s hard for me to know what the implications of this are at any level because you’re approaching poorly documented territory. Even that which is documented is somewhat out of date (Object Allocation docs out of date? · Issue #48784 · JuliaLang/julia · GitHub).

The ability to directly handle and represent memory may be Julia’s weakest point.

uniment · February 28, 2023, 3:32am

When I look at the definition for Core.Box, it’s pure Julia with no ccall or :jl_ shenanigans. So, I’m inclined to believe the prevalence of “box” in discussions elsewhere is merely a result of overloading the word, rather than a reference to this or any related structure.

Seems like the easiest thing to do would be to edit it and see what happens, but shamefully I’ve never made these sorts of core language edits and don’t know how

Zach_Christensen · March 1, 2023, 4:23am

But I think they are intended to be the same thing in many instances so it once again becomes a question of doing the immediately easy thing or putting in the work to converge on a common type where appropriate.

I should also clarify that JIT compilation makes memory in Julia much less straightforward to standardize than in a language like Rust. Users often shouldn’t be manually getting involved in something like Box in Julia. If you can sort through some of the obscurity, you might be able to make some interesting proposals related to memory representation, given the posts I’ve seen from you.

Sukera · March 1, 2023, 6:47am

The term Box comes from Boxing (computer science) - Wikipedia. It is not a julia specific term. That wikipedia article is a good starting point for learning about this stuff, but it is not why & how a Box ultimately exists and works in julia as of today.

This doesn’t matter. Before the if, r has one type (Int). After the if, r may have another type. f needs access to BOTH types and barring knowing that there will only be one (so you don’t know that r will occupy only one memory size), you need to box the variable, because r could potentially be a variable of type Union{Int, MyComplexAbstractType}.

I’ve said as much in our discussion on slack when you asked about this. I’ll reiterate for posterity: this is a case where the box could potentially be refined away at a later stage in the compiler, but it would definitely require some work in the lowering stage to keep the information around that a given struct & variable comes from a lowered closure. It would mean closures are more special, not less!

Just use Ref, which already has that behavior. As far as I understand, this is not done right now because again this requires changing lowering in much the same way as mentioned above and people willing to dive into the Scheme of the parsing / lowering stage are a rare commodity (Not to mention that work needing to be redone entirely IF JuliaSyntax truly does grow to do lowering at some later stage as well, so the benefit of doing this now seems dubious).

Recursion in a compiler is usually a Bad Idea - at the very least, you’d want to bound the recursion, so as not to run into pathological cases that are accidentally exponential in runtime.
Regardless, this is something you need either way, and is part of why this is a tricky problem and can’t just be implemented in an afternoon. You really don’t want to run into cases where the compiler is extremely slow during compilation, which is something people have complained about much, MUCH more vocally than whether the final code is slow due to captured variables (which is very easily fixable/has a workaround: capture less).

Determining this in general comes down to knowing at compile time which branch can be taken under which circumstances, and figuring out whether those conditions are mutually exclusive. In the extreme case, this is either unknowable or requires a SAT solver to determine. For example:

f() = rand(Bool)
g() = rand(Bool)

function foo()
    if f() || g()
        # ...
    elseif f() && g()
        # ...
   end
end

Looking at just the source code, you can be in any (or no) branch at any given execution step. Even with type inference, this doesn’t get better. Yes, your idea might work for the simplest cases of if/else (guaranteeing exclusivity), but that again is treading on mightily thin ice, because if you miss a case, you’ll get very difficult to reason about behavior.

No, this doesn’t matter. A type annotation in the parent scope only serves to make the parent scope type stable after use:

julia> function foo(x)
           a::Int = 1
           () -> a += x
       end
foo (generic function with 1 method)

julia> @code_warntype foo(1)()
MethodInstance for (::var"#7#8"{Int64})()
  from (::var"#7#8")() @ Main REPL[13]:3
Arguments
  #self#::var"#7#8"{Int64}
Locals
  a::Union{}
Body::Int64
1 ─ %1  = Core.getfield(#self#, :a)::Core.Box
│   %2  = Core.isdefined(%1, :contents)::Bool
└──       goto #3 if not %2
2 ─       goto #4
3 ─       Core.NewvarNode(:(a))
└──       a
4 ┄ %7  = Core.getfield(%1, :contents)::Any
│   %8  = Core.typeassert(%7, Main.Int)::Int64
│   %9  = Core.getfield(#self#, :x)::Int64
│   %10 = (%8 + %9)::Int64
│   %11 = Core.getfield(#self#, :a)::Core.Box
│   %12 = Base.convert(Main.Int, %10)::Int64
│   %13 = Core.typeassert(%12, Main.Int)::Int64
│         Core.setfield!(%11, :contents, %13)
└──       return %10

IF you type parametrize the closure to allow the type of a to be inferred, it’ll be inferred just as the type of the argument x is inferred.

The old closure function is invalidated and you get a new closure definition at the next call because changing a function is only visible to running functions once execution returns to the top scope. Our eval (which is what defining a function does) is not visible immediately - we have world age for a reason.

What you’re describing is the way this is now, and not the reason for the current behavior. The reason for the current behavior is that julia is a dynamically typed language, and type instability (barring additional smarts during lowering) is just the name of the game here as a result of being dynamically typed.

This again sounds like you want to have some dynamically typed expressions be disallowed in julia, which sounds more like a statically typed programming language. While I don’t disagree that a static environment prevents such shenanigans, it goes against the core of the language and thus adding a requirement such as requiring a type annotation is extremely unlikely to be implemented.

Type inference is already recursive in those cases, up to a limit, at which point we fall back to Any. It’s good to keep in mind that due to julia being a dynamically typed language, type inference is an optimization that is allowed to fail. If it were not allowed to fail, we’d have a type checker that throws compile errors (and thus a statically typed language) instead.

No - even very simple cases (as you’ve shown) run into “Inference doesn’t have enough information due to the lowering stage inserting things that make it harder to infer (like Box) while at the same time missing some information that lowering could have provided to make it possible to infer in the first place”. In order to “lean on” the existing type inference more, you need to change the lowering stage, which brings us right back around to Scheme and how few people there are willing to dive into this specific thing.

uniment · March 2, 2023, 1:35am

This doesn’t matter.

you need to box the variable

I’ll reiterate for posterity: this is false. At the moment that f is instantiated, the type of r is determinable, and its value doesn’t change afterward so a mutable structure isn’t needed.

Reminder of `abmult` code

function abmult(r::Int)
    if r < 0
        r = -r
    end
    f = x -> x * r
    return f
end

To illustrate, the abmult example currently lowers to something approximating this:

struct var"#13#14" <: Function
    r::Core.Box 
end
(f::var"#13#14")(x) = x * f.r.contents
function abmult(var"r@_2"::Int)
    r = Core.Box(var"r@_2")
    if r.contents < 0
        r.contents = -r.contents 
    end
    f = var"#13#14"(r)
    return f
end

However, because r is never syntactically reassigned after f is declared, it could instead lower to something like this:

struct var"#13#14"{R} <: Function
    r::R
end
(f::var"#13#14")(x) = x * f.r
function abmult(r::Int)
    if r < 0
        r = -r 
    end
    f = var"#13#14"(r)
    return f
end

The fact that this code works and infers correctly (due to Julia’s IR SSA form), and it has all the same official functionality, should make my point evident: identifier r can be associated with multiple values (or even types) during initialization, but that does not necessitate a Core.Box because r is never reassigned after f is instantiated. The purpose of capturing a value in a Core.Box is to allow assignments to be visible to all scopes that capture it, and there are no assignments after f is instantiated, so it’s unnecessary. I’m not sure how I can make this simpler.

Perhaps you missed the opinion I expressed at the beginning, as well as Micah’s reflection:

Naïvely written Julia should be performant.

in general, making the user understand Refs and such doesn’t feel right.

We shouldn’t settle for type instability when we don’t need to. If Julia’s tagline is “Sure, you can make performant code, but good luck with that!” then people should prefer to stick with Python, where they know they can get performant code. (Any skilled Python programmer can make performant Python—by hiring a C programmer. Those are liquid and fungible assets freely tradable on efficient open markets with excellent price discovery, tight spreads, deep order books, and competitive market makers.)

Again, Idea 2 is simply to type-parameterize Core.Box, which doesn’t require any type inference. Benefits can be realized immediately in cases where a variable is type-asserted—in those, lowering already generates code to typeassert each access of the variable. The change in lowering would simply be to recognize that a boxed variable is being type-asserted, and to Core.apply_type the asserted type into the Core.Box parameter when declaring it.

Yeah, for Idea 3 there are many details of ways to skin this cat that I didn’t want to get into yet. But even without bounds, typejoin tends to converge very rapidly by design (typically in one or two iterations). Bounds could be needed for user-defined methods of typejoin though; what I had in mind was three iterations (something near the limit of union splitting) with a fallback of Any.

To clarify, I’m only asking for what can be determined from syntax. I thought that was clear in the post, but I guess I have to be more explicit. Some branches are separated syntactically, and it’s obvious that they’re mutually exclusive.

I think we’re talking past each other on this one. What I was referring to there was code like this:

function foo()
    a = 1
    x -> (a += x)
end

Per Idea 3, a would be inferred as Int and placed in a Core.Box{Int}, except for the fact that x’s type is unknowable to foo. As a result, it seems most appropriate to use a Box{Any} instead.

(this seems like a misapplication of the word “again.”)

You can say that, I suppose. But this restricts a use case I find extremely unlikely, in exchange for type stability in a pretty large class of code structures. For the vast majority of users, Julia would feel just as dynamic—but simply run faster.

Perhaps I lack vision, but I have trouble imagining a world in which I would wish to instantiate a closure which makes assignments to its capture, and then update some method (which, as you said, increments the world age) that the closure relies on to return a different type—causing the closure’s capture to change type—and have the closure still be valid. Can you offer any examples where this is legitimately useful? I’m happy to back off on this part of the request, if it can be shown that it would cripple a feature that has legitimate and compelling utility.

In Idea 3, it’s already accepted that the lowering stage would be modified—but in the most minimal manner possible: to construct expressions which will ask Julia to calculate the proper type. My question about leaning more on Julia’s type inference system is: can we have the Scheme emit code that is more naïve and requests Julia’s type inference machinery to do more than what I have imagined? For example, I’m imagining the Scheme performing some fairly sophisticated code transforms to build expressions that calculate the resulting type of each assignment; is there some Julia function I can just copy-paste QuoteNode-wrapped :(=) expressions into, or something like that?

As a final note, in a theoretical future where lowering is rewritten in Julia, whatever type-inferring Julia is emitted by the Scheme is likely to carry over anyway—so the effort wouldn’t all be wasted.

Zach_Christensen · March 2, 2023, 4:04am

My confusion isn’t the term itself. It’s why this boxed variable is different than all the others we have. I thought the Julia type system boxed every independent variable along with it’s metadata. How does Box help us in any unique way. Why not just use Some{Any}?

uniment · March 2, 2023, 6:15am

To satisfy the desired semantics of closures being able to capture and modify variables that exist in their parent scope—and for those changes to be visible to all scopes which enclose those variables, including closures that outlive their lexical scope—those variables must be stored in a structure that is mutable, which Some is not.

Sukera · March 2, 2023, 11:42am

It absolutely is needed. Variables in local scope don’t “change type” as the scope progresses, they have one type in the whole scope:

julia> g(x) = float(x)
g (generic function with 1 method)

julia> f(x::Int) = x = g(x)
f (generic function with 1 method)

julia> @code_warntype f(1)
MethodInstance for f(::Int64)
  from f(x::Int64) @ Main REPL[2]:1
Arguments
  #self#::Core.Const(f)
  x@_2::Int64
Locals
  x@_3::Union{Float64, Int64}
Body::Float64
1 ─      (x@_3 = x@_2)
│   %2 = Main.g(x@_3::Int64)::Float64
│        (x@_3 = %2)
└──      return %2

Due to the local variable x shadowing the argument x (and inheriting its type!), the type of x inside of f is a Union. This is already inferrable purely from the syntax alone and what type inference does is refine the eventual return type of the function, because there is only one path that x can take. This is not the case in your abmult example, hence lowering has to be conservative here and somehow signal this ambiguity to the closure.

Again, this is solvable by making lowering more integrated with type inference, I’m not disputing that. But your argument that a Box, from the POV of lowering, is not necessary here is not correct.

Yes, that is correct, and I’m not disputing that. What I am saying is that this requires a lot of extra work in the Scheme code handling the lowering, for a minor benefit of convenience, that just noone qualified has taken on yet because the people who do know the Scheme parser & lowering stage are currently occupied with other matters (like core compiler improvements that have a larger effect on better inferrable code).

I find this sort of hyperbole not very useful in this discussion. Noone claimed that getting good performance in julia is hard, and I don’t think you’re implying that capturing variables in a closure is CRUCIAL to getting good performance in any language.

It does require type inference, because now you have to lug around that type parameter and infer its type when its used. This is what I was referring to. Worse, now you need to somehow get type inference in the closure to affect the type in the function the closure originated from, and keep that inferred type consistent - the two functions become much closer tangled than they are right now. This is not a problem you can “just do X” away.

And the example I gave is perfectly determinable from syntax, the conditions are just chosen such that generalizing the pattern leads to solving SAT

Yes, of course we can! That’s the entire point that I’ve been trying to communicate all along. EXCEPT that modifying the Scheme parser is neither particularly productive, nor maintainable, and hence such efforts should (at best) be reserved for such a time where the lowering stage is not in Scheme anymore and redesigned with a more flexible architecture. Or you find someone willing to implement it in Scheme, I guess, the source is here (you’ll notice that parsing & lowering is closely intertwined, which is part of the issue and why JuliaSyntax.jl exists in the first place).

Ok, I think this gets at the core misunderstanding here - Scheme is NOT performing any type related calculations. None at all. Zero. Zilch. Nada. That’s the job of type inference, which runs after lowering is complete & done. This is the point I want to really hammer home here - lowering has no concept of types. Any redesigning of lowering incorporating more type inference smarts means either communicating better information down to type inference (there is no communication back after all) OR redesigning how type inference interacts with the frontend as well. This is why this hard and not as simple as just emitting better lowered code.

You can look at Meta.@dump, Meta.@lower and the @macroexpanded versions of that, but you’ll very quickly find that they take the string/raw expression and pass it to Scheme via a ccall.

The reason that a variable that’s captured in a closure and ends up in a Box that way is different from a variable that ends up being boxed due to a type instability is because of where that boxing occurs. The former occurs before any types of anything are known, because the semantics of the syntax and what we want it to mean dictate the Box to be there, irrespective of the type, while the latter is due to type inference later on knowing that actually, the syntax and its semantics alone are not enough to figure out the need for a box, it’s the semantics of the written code (aka what we mean a function to do, not just the semantics of the syntax describing that code!) that ultimately mean that a variable ending up as Union{Foo,Bar} needs to be boxed (though this box is different from the one inserted by the parser, because type inference does know about types).

Mind you, the heuristic used by lowering for inserting that Box is of course not perfect, and it’s unlikely to ever be… I don’t have a proof for it, but I’m quite certain that you can construct code where you’re forced to insert a Box, due to julia-the-language not being properly dependently typed. Most likely this will happen with code working directly with and on types themselves.

In theory pretty much all uses of where the current lowering ends up putting a Box ought to be replaceable by a better-inferrable (though not necessarily type stable in general) counterpart, it’s just that this is a major piece of work of questionable overall benefit (compared to say general compiler optimizations).

If we actually did that and ran an interpreter on that, our code would be no faster than python. While julia is a dynamic language and that mode of execution is pretty much always an escape hatch (it’s more-or-less what happens when a variable is inferred as Any), actually doing that is just not performant.

Zach_Christensen · March 2, 2023, 3:25pm

If you want mutability then use RefValue. My point is more about why we have something that isn’t documented but could be entirely replaced by something else that is.

My understanding is that not all boxing throws out all type information, so something like Int64 is a julia specific type that wraps around a primitive C signed long type.

Sukera · March 2, 2023, 3:52pm

That understanding is incorrect - a julia Int64 is exactly an 8 byte long bag of consecutive bits. It is most definitely NOT a wrapper around a C signed long (and most certainly not just a label for one!). For one thing, overflow in Int64 is defined to have wraparound behavior, while the same thing for a C signed long is undefined behavior.

In fact, it’s even defined as a primitive type:

github.com

JuliaLang/julia/blob/master/base/boot.jl#L230


      
              applicable, invoke,
              # constants
              nothing, Main
          
          const getproperty = getfield # TODO: use `getglobal` for modules instead
          const setproperty! = setfield!
          
          abstract type Number end
          abstract type Real     <: Number end
          abstract type AbstractFloat <: Real end
          abstract type Integer  <: Real end
          abstract type Signed   <: Integer end
          abstract type Unsigned <: Integer end
          
          primitive type Float16 <: AbstractFloat 16 end
          primitive type Float32 <: AbstractFloat 32 end
          primitive type Float64 <: AbstractFloat 64 end
          
          primitive type BFloat16 <: AbstractFloat 16 end
          
          #primitive type Bool <: Integer 8 end

It’s commented out because it’s defined in the (very small) C core of julia, because otherwise bootstrapping a compiler is quite a bit harder. You’d need a persistent artifact to carry around for the initial compilation - in our case we do that by building our compiler (with codegen and all) from various C/C++ sources, which then sets up a system image to use for compilation (the various hooks in which use that just-compiled .so for compilation).

Thinking of julia types as “wrappers around C types” is a mindset that can very easily mislead, because it’s just not true.

DNF · March 2, 2023, 6:29pm

(Perhaps I’m being whooshed, and you were just joking.)

I’m not getting into the technical argument, but you must live in a very different reality than I do. No python programmer I know could ‘hire a C programmer’ to speed up their code. A high-level manager might be able to do that (one who’s not a programmer anyway), but it’s a lot of work to do, and a huge initial expense, especially for a small job.

The vast majority of python devs have no budget to spend freely at all, and that’s not even considering open source, where there is often zero money.

j-fu · March 2, 2023, 9:25pm

I think the C-programmer remark was just sarcastic irony which I kind of like but is sometimes hard to discern.

I really support these two lines:

The point here is that Julia has the chance to beat Matlab & python proper with performance and C/C++ with accessibility to a broader public.

As my anecdotical evidence shows, we are not there yet. For two reasons: there is space for improvement - as this RFC shows - and we don’t explain enough (or not well enough) . It is not easy, but feels possible (I am trying as well). So I kind of like the performance cheat sheet idea as a stopgap measure.

uniment · March 2, 2023, 9:32pm

I wanted to make a point in a humorous way.

I see lots of threads wondering, why is my {Matlab, Rust, NumPy} code faster than my Julia? I also see lots of threads wondering why isn’t Julia more popular? These questions are related; at the end of the day, it comes down to time, and the ways in which human efforts are to some degree fungible; efficiencies are already had by specialization and trade, which is ultimately what Julia competes against.

Julia allows us to push the limits of what can be done performantly—which is wonderful!—but we have to do more for ourselves, which means exposing ourselves to a large surface area for making obscure errors and falling into performance traps. A reasonable person (i.e., product manager) would rightfully ask: “Why spend time learning a new language and its pitfalls, when we can exploit what we know and leverage time-tested libraries already written by specialists in C? Why deal with the illiquidity of a thin market of Julia programmers, when I can hire Python coders by the dozen and contract out C work?” If a manager takes the career risk of moving their team to Julia, it needs to be an obvious slam dunk—but currently, the set of problems for which it is an obvious slam dunk is rather restrictive. I want to make this set less restrictive, by increasing the set of naïve code written by non-specialists which will be performant.

It should be obvious that I personally want Julia to succeed, I personally want to use it everywhere, and I’ve personally figured out how to avoid some of these performance traps. However, dodging traps currently presents a nontrivial mental load, and with a team that [inevitably] has some less-competent people I would want the barriers to performance to be lower.

If we want Julia to be more popular and useful, then I stipulate that the difficulty of achieving its raison d’être of performance should be lower wherever feasible. As I’ve laid out, in this case it is. So, undergrads who lack the perspective of being employed, managing a team, running a business, or being an investor—so can’t appreciate the value of time and probabilistic reasoning, and are inclined to pedantry and argumentation—making long-winded and provably incorrect assertions and then dismissing the goal’s worthiness is comical and counterproductive.

Topic		Replies	Views
Can someone explain closures to me Internals & Design performance , type-stability , closure	13	3019	November 2, 2023
Performance closures with changing value of closed over variables Performance closure	1	384	April 20, 2021
Language restriction to solve 15276? Internals & Design	48	3840	May 11, 2018
`Core.Box` around captured variable General Usage closure , corebox	2	355	December 6, 2022
Help understand boxing Performance	18	908	March 22, 2024

RFC: Some Ideas to Tackle #15276 - performance of captured variables in closures

Related topics