RFC: Some Ideas to Tackle #15276 - performance of captured variables in closures

as this RFC shows - and we don’t explain enough (or not well enough) . It is not easy, but feels possible (I am trying as well). So I kind of like the performance cheat sheet idea as a stopgap measure.

Not to take away from the idea of a good cheatsheet or fixes to #15276, but one more option: make it easier to detect and diagnose performance problems, and (hopefully) thereby make it easier to learn how to write performant Julia code. I am hoping that a big step is on the way with Implement annotated source-view & make default by timholy · Pull Request #356 · JuliaDebug/Cthulhu.jl · GitHub; it’s too complicated for first-time programmers, but I’ve successfully taught a beta version to a room full of intermediate programmers (who were newcomers to Julia).

20 Likes

I mean, if it’s so trivially fixable as you claim, surely a small patch to the scheme source I linked is easy enough? It would certainly be welcome, and with a quick run of PkgEval even verifiable for correctness & safeguarding against regressions.

If it’s indeed not as trivial (or fixing as many cases as you claim it will), as I tried to explain in the comments above, then that makes your assertions look (to me at least) like a longer version of a “julia won’t succeed unless X” post (in spite of the fact that fixing the closure capturing would be a very good thing indeed - all I’ve claimed is that it’s not as simple as you lay it out to be, after all).

I take issue with this needlessly hyperbolic statement. You’re making a lot of sweeping conclusions about a persons background, context and expertise and use that to dismiss their contribution entirely, just because their perspective is different from yours.

Make no mistake - I want julia to succeed. I’m (in part) betting my career on it, as are a lot of other early adopters. That doesn’t mean though that what you deem absolutely crucial to the success of julia is what I (or others) deem crucial, or that the specific issue pointed out is easy to fix in a general manner without introducing more and more special cases that are hard for newcomers to understand. You asked for comments - your post is even titled “RFC” - and that’s what I gave. If you don’t like them, rebuke the actual points instead of resorting to asserting them false and falling back to ad hominem. It doesn’t strengthen your point.

3 Likes

Protip: dismissing people’s concerns is not the optimal strategy to land a job :wink:

Ad-hominem is not my fallback. But it’s hard to have a good sense of how time and resources should be optimally allocated without life experience. [censored] I can justify it in a theoretically meaningful way, but it would take many words that are better said in a different thread.

I never claimed it was trivial; I said it was feasible. The difference is meaningful.

Maybe I could have been more specific: Request for Correct Comments, as I place a large bounty on correctness. The value of incorrectness is neutral to negative—I’m quite tolerant of incorrectness if the rate of convergence on correctness through iteration is rapid, but I grow tired and impatient of stubborn incorrectness.

The value of lengthy diatribes of strongly-asserted, superficially-correct-sounding yet incorrect claims based on fundamental misunderstandings, wherein the fallback response is repetition rather than rethinking, combined with argument for argument’s sake and seemingly discouragement for discouragement’s sake, is strongly negative. [censored]

I’m definitely interested in knowing what reasons there are for my ideas to be bad—my interest is in maximizing rate of convergence on truth—but receiving incorrect reasons, combined with arguments waged against misperceptions of my ideas, and having to laboriously falsify all of them is a substantial time drag and a costly tax on the signal-to-noise ratio of the conversation.

Instead of rebutting the entire post as I did last time, I’ll rebut them one-at-a-time, iterating each until we can converge on agreement, in sequence. We can discuss more coherently this way.


Let’s start from the beginning with Idea 1. Once we’ve ironed this one out, we can move on to the others.

With regard to whether abmult needs a Core.Box:

Strong assertion! If there were any uncertainty, you’d have used hedging words right? Instead we have emphasis. What if such an assertion proves wrong?

You describe how an identifier can be inferred to have a Union of types when there is a single code path. Then, we have this:

From what I gather, your argument seems to be that assignment in a conditional will thwart the ability to type-infer a Union. However, that’s readily shown false. In this example, x is inferred to be of Union{Float64, Int64}:

julia> foo(x::Int) = begin
           if x < 0; x=float(x) end
           return x
       end
foo (generic function with 1 method)

julia> @code_warntype(foo(0))
MethodInstance for foo(::Int64)
  from foo(x::Int64) @ Main REPL[5]:1
Arguments
  #self#::Core.Const(foo)
  x@_2::Int64
Locals
  x@_3::Union{Float64, Int64}
Body::Union{Float64, Int64}
1 ─      (x@_3 = x@_2)
│   %2 = (x@_3::Int64 < 0)::Bool
└──      goto #3 if not %2
2 ─      (x@_3 = Main.float(x@_3::Int64))
3 ┄      return x@_3

Further, it also seems to follow from your argument that any closure that captures the result of a conditional cannot be type-stable. This, too, is readily shown false (even when the conditional branches result in different types!). Example:

julia> function bar(x::Int)
           y = rand(Bool) ? float(x) : x
           () -> y
       end
bar (generic function with 1 method)

julia> a, b = bar(1), bar(1)
       a.y, b.y
(1, 1.0)

julia> @code_warntype a()
MethodInstance for (::var"#5#6"{Int64})()
  from (::var"#5#6")() @ Main REPL[11]:3
Arguments
  #self#::var"#5#6"{Int64}
Body::Int64
1 ─ %1 = Core.getfield(#self#, :y)::Int64
└──      return %1


julia> @code_warntype b()
MethodInstance for (::var"#5#6"{Float64})()
  from (::var"#5#6")() @ Main REPL[11]:3
Arguments
  #self#::var"#5#6"{Float64}
Body::Float64
1 ─ %1 = Core.getfield(#self#, :y)::Float64
└──      return %1

Notice that even though the capture’s type varies depending on which branch was taken, and even though bar is itself type-unstable(!), the capture is not boxed. Accessing the captures of a and b is a type-stable operation.

Try to reason from first principles. Under what circumstances must a capture be held in a mutable container in order to satisfy its semantics? Under what circumstances must its container be type-unstable to satisfy its semantics? If you attempt to answer these questions thinking mechanistically from first principles, I think you will find clarity.

In the case of the abmult sample code: Can it be known from syntax that r’s value will not change after the closure f is instantiated? Yes. Can r’s type be concretely known at the time of the closure’s instantiation? Yes. (the closure’s constructor serves as a function barrier.) Can it be known from syntax that its concrete type will not change after f is instantiated? If its value can’t change, then its type can’t change. Put these together, and you have immutability and type-stability: no box needed whatsoever.

Unless I’m butchering your argument somehow, I think we can consider this critique debunked. Shall we proceed to the next one, or do you dispute this conclusion?

@uniment , at the risk of incurring objections that I am appealing to authority, I think it might be worth keeping in mind that @Sukera has contributed hundreds of commits consistently for the past 4-5 years all across the Julia ecosystem, many to the language itself or to very “crunchy” compiler-related packages.

While I am sure all the readers here value your comments and ideas, I definitely would imagine that @Sukera understands the compiler internals in significant detail and I would encourage you to approach your disagreements from a place of curiosity, as likely there is one of these details you have not considered yet.

There is very possibly just a communication barrier in explaining the issues with the proposal here without doing a true deep dive into the language implementation.

14 Likes

Could you explain to me why the type inference processing is beholden to the lowering stage inserting a box?

In the case where r goes out of scope immediately after being put into the closure, why can’t we just look at it’s type at that point and say, “Cool, this closure’s r property is an Int.”? This is an honest question. The “can’t we just” means I know it can’t be that simple.

If we concede that the lowering stage is an immutable fact of life, what levers could be pulled to make it easier to have type stability in closures? Or is it your position that this is just how it is?

The problem is that type inference doesn’t know that the box is a closure.

1 Like

While I appreciate the sentiment, my code-based contributions very likely don’t count in the “hundreds of commits” :slight_smile: All of this, as it’s open source, is a team effort, and credit for those contributions should go to those who wrote the code - I’m just trying to do my part by sharing what I’ve learned about how julia works, what sort of stuff is likely to work & be fast, and what sort of changes need more or less work based on my gut feeling & understanding of how the stack works (which definitely has been wrong in the past! For example, I remember a particularly insightful conversation in Slack about why arrays of mutable objects can’t store their elements inline to save a pointer dereference, and the reason for that is because the mutable object may be referenced by another variable other than the array - i.e., the lifetimes of the array and the object may not match, and they need to for all objects in that array if you want to be able to store them inline. It’s why Rust has the borrow checker. Many thanks to @vchuravy and @jameson for the patient & insightful explanation!).

Ah, the perils of the old adage - so much to do, so little time… Doesn’t really help that it’s always changing too :joy: (for the better, of course!)

I’m not dismissing your concern - I’ve stated multiple times, in this thread and elsewhere, that yes this issue is real and yes this is (likely) fixable, but also that I don’t think this particular issue requires immediate fixing, because fixing it is non-trivial and any change here can have big ripple effects, in particular in regards to compile time/TTFX. At a time where everyone is doing their best to reduce that, recognizing that a solution like Box{T} increases that, due to more inference being inherently required because of that parametrization, should really not even be required to mention explicitly.

You thinking otherwise is perfectly fine, but construing my concern about compile times & the ripple effects this can have (as well as the issues with the particular proposal) into giving unsolicited career advice is, to me at least, nothing more than a slap in the face. Especially when it’s delivered as part of an ad hominem dismissal of a persons identity.

Lots of assumptions about me in there. As far as I know, you know pretty much nothing about my age & background, so I don’t know where you’re getting that from. I don’t really want to get into that here, but my issue with this kind of dismissal is that it implies that there’s no room for someone (anyone!) younger & with (perhaps) less experience to learn those things. “Life experience” is just too nebulous a thing to be used as a counterargument to a factual disagreement.

No, that is not my argument. My argument is that assignment in a conditional may thwart the ability to infer a single concrete type for the assigned-to variable, which is correct if the assignment ends up changing the type of that variable. Since variables only ever have one type in a single scope (they don’t change types), this results in a widening of the type of the variable to a Union, falling back to Any if the Union is too disparate, in order to save some time when inferring types later on, by deferring further type inference to the next use at a function barrier. If we didn’t fall back to Any, we’d very easily get a combinatorial explosion of type inference when those Union variables interact with other Union variables, which sounds like a road to very long compile times indeed.

The crucial issue is that the lowering stage has no information whatsoever about what type inference ends up being able to refine later on, handing what it got off with a sort of “it’s your problem now” attitude. It’s this that would need to be changed, and what I consider to be the hard part of fixing #15276, even just in the manner you described.

No, that is not my claim and it doesn’t follow from my argument.

Your example investigates whether the resulting closure is type stable, but what I’m talking about is whether bar is type stable, which it is not:

julia> @code_warntype bar(1)
MethodInstance for bar(::Int64)
  from bar(x::Int64) @ Main REPL[1]:1
Arguments
  #self#::Core.Const(bar)
  x::Int64
Locals
  #3::var"#3#4"
  y::Union{Float64, Int64}
  @_5::Union{Float64, Int64}
Body::var"#3#4"
1 ─       Core.NewvarNode(:(#3))
│         Core.NewvarNode(:(y))
│   %3  = Main.rand(Main.Bool)::Bool
└──       goto #3 if not %3
2 ─       (@_5 = Main.float(x))
└──       goto #4
3 ─       (@_5 = x)
4 ┄       (y = @_5)
│   %9  = Main.:(var"#3#4")::Core.Const(var"#3#4")
│   %10 = Core.typeof(y)::Union{Type{Float64}, Type{Int64}}
│   %11 = Core.apply_type(%9, %10)::Type{var"#3#4"{_A}} where _A
│         (#3 = %new(%11, y))
└──       return #3

And you’ll also notice that the closure is already parametrized and that there already is a typeof(y) check to make further use of that closure type stable. None of that changes the fact that assigning to y does not happen conditionally - the assignment ALWAYS happens. The variable is ALWAYS type unstable.

That typeof has already been inserted through a game of whack-a-mole and special casing of the lowering stage and while this game can be played pretty much indefinitely, the fact that nothing much happened in that regard in recent times likely means that we’ve hit diminishing returns on this strategy of pattern matching. You may disagree with that, and that’s perfectly fine.

Only on the top level, where you’re guaranteed to know the type of the variable:

julia> function foo(x)
           f = bar(x)
           f()
       end
foo (generic function with 1 method)

julia> @code_warntype foo(1)
MethodInstance for foo(::Int64)
  from foo(x) @ Main REPL[3]:1
Arguments
  #self#::Core.Const(foo)
  x::Int64
Locals
  f::var"#3#4"
Body::Any
1 ─      (f = Main.bar(x))
│   %2 = (f)()::Any
└──      return %2

Due to the type instability in bar, the type parameter of the closure cannot be inferred and further use of that closure is thus trivially type unstable as well. You can introduce a function barrier when using f to make further inner use of f type stable (due to julia specializing on the concrete type of a variable when doing a function call), but you cannot make foo itself type stable here.

You’re right, I’ve reasoned here from first principles, but I don’t think I’ve found the clarity you’re looking for. Let’s not get into a philosophical debate about the nature of our universe though.

Ok, you’ve won - I give it to you, your award of being technically correct. I have seen the error of my ways, mea culpa, and will try to do better in the future. Yes, in this one specific, artificial example:

you are right and I am wrong. This could be specialized with an appropriate convert and a parametrization of the closure, just as with the example from above.

However. This doesn’t generalize to more complex situations where r is used after the construction of f, leading to this being yet-another-case of whack-a-mole (which is fine, if that’s the game you want to play). It’s this lack of generalization and the presentation of this being the solution to all things #15276 that I think is where you’re going wrong. Real code is just a lot more complex than the simplified examples we’re talking about here, but if it’s easy to come up with simple examples that don’t work with the proposed solution, I shudder at the thought of what will happen with code in the Real World :tm: (though it’s likely benign and won’t do something more nefarious than not matching the pattern, leading to a Box yet again…)

The (extremely simplified) pipeline the compiler uses to go from source code to executable native code looks like this:

  • Parse the source code into tokens
  • Lower the tokens into a Julia intermediate representation (or IR for short)
  • Infer types on that intermediate representation
  • Use those inferred types to compile a native piece of code & run that code, possibly calling back into any of the above stages through various means (dynamic dispatch, eval…)

As @Oscar_Smith mentions, the basic issue is that there are cases where, as far as type inference is concerned, two syntactically different pieces of code may lead to effectively the same Julia IR once they pass the second stage above. You don’t need to look far for an example of that - @uniment gave exactly such an example above:

These two are both valid high-level julia source code and type inference doesn’t know that, once the first is lowered, it looks exactly the same (modulo some variable names) as when the second is lowered. You can’t just have type inference erase a Box, because that would mean changing the semantics of the second piece of code. That’s definitely not what you want in a compiler. Thus, making type inference smarter to handle this automatically means giving it more information from lowering, and writing optimizations specifically for that information. This is what I meant above with this approach resulting in making closures more special. Maybe that’s ok for you, I don’t know, but I personally really like that julia has (compared to other languages) very few things that cannot be built by a user and instead need to be built in. The fact that the vast majority of code written makes exclusive use of generic function is an immensely impressive feat, and I’d personally prefer to keep it that way.

On the other hand, if you don’t want to make closures more special and instead want to make more use of the existing type inference infrastructure, you’re still going to need to change how those two pieces of code lower, for example by adding a type parameter to the closure that’s passing along the concrete type of the capture variable. Just Box{T} is not going to be enough, since then you’ll want to parametrize the closure too, otherwise you’ve just moved the point of where the type instability is resolved around. As explained above though, this results in a game of whack-a-mole of pattern matching, to hopefully exhaust at least the common patterns and doesn’t generalize cleanly, introducing weird failure modes and (somewhat) arcane rules that are likely to be confusing for newcomers.

We certainly can! But then the next “small/simple” example pops up. And the next. And the next. Where do we stop with special casing? At what point do we decide “no, this is enough, it just fails here, we need a more general solution”? Just the fact that #15276 has gone on and on for such a long time should be ample notice that we’ve already hit that point, so any proposal towards solving that, IMO, should be thought of in that context.

Certainly not :slight_smile: Everything is changeable, but the difficulty & efficacy of doing so changes drastically, depending on what you want to tackle (and sometimes you’ll have to wait for a breaking release to even begin discussing some stuff). It’s an unfortunate fact though that some areas (like parsing & lowering) have an inherent barrier to entry, due to being written in Scheme, which makes it much more difficult to add & fix things very early on in the pipeline… That’s not to say that using Scheme was a bad choice at the time - I’m a big proponent of using the tools you know and are efficient in to achieve your goals - but just that the success of julia means that it’s no longer quite as fit for the purpose as it used to be. Such is the nature of software development; requirements & environments change and from time to time, they need to be reassessed, rethought, and potentially - rewritten :person_shrugging: (I couldn’t resist the alliteration, apologies :slight_smile: )


Lastly, I want to apologize. Reading my responses back, I was certainly writing from a place of frustration, due to my argument not being heard, and (just like @uniment) not wanting to repeat myself in the face of stubborn incorrectness (as @uniment succinctly puts it). In part, that certainly was due to me assuming more context from a reader than I ought to assume, based on previous discussions about this particular topic. I guess I’ll take that as a cue for me to write more patiently.

8 Likes

Thank you, this is what I was looking for. We’re making progress.

This was just Idea 1, the scope of which was stated in the OP as:

so I’m not sure where the idea that this is presented as a solution to all things #15276 comes from.

Correct, the title of Idea 1 is:

It generalizes to cases where r is accessed after the closure is initialized, but not where r is assigned after. For cases where r is assigned after the closure is constructed, you need a mutable Core.Box—this is what ideas 2 and 3 are for.

I’m happy we’ve made some progress, so I’ll leave it there for now. We can discuss Idea 2 next time.

Apologies, I should not have used hyperbole here. Regardless, I think you’re VASTLY overestimating the number of cases where this is applicable, which is the point I’m trying to make.

The contents of the Int64 is exactly 8 bytes and the content of Int32 is exactly 4 bytes. The sizeof does not include the boxing/type information surrounding anything. In the case of Box we don’t store the actually data but a pointer to the boxed data, so it’s sizeof(void*) pointing to a Julia type. So Box forces storage of the literal box type in memory somewhere because we strip away type info. Unless we can cache type information elsewhere and point directly to the raw bits data on access, this is what happens in some variation.

You can see this for the number of dimensions in Array{T, N}. N has to be unboxed because it’s part of the type, instead of part of the stored data. This is one reason bits type structs are so efficient. Such cases don’t actually require storing the boxed type within each field (rather as part of the Julia type), but the isolated Julia representation requires being wrapped in a Julia structure when passed around and you can see some of this as part of the transformation in ccall.

Behind the scenes it gets a bit complicated when certain things need to be brought back into play. I’ve found this to be particularly tricky when figuring out the GC for the first time (as a part of this PR). Union types are a bit of middle ground between completely separating storage of boxing info and storing a discriminator to select which member of a union is pertinent. Core.SimpleVector just stores a pointer to the fully boxed value no matter what you give it.

My point is that we do a lot of stuff behind the scenes to avoid this sort of stuff requiring explicit storage with every value, but there are times where we intermediate representations of code make this sort of thing much less important or ambiguous. I think focussing efforts on increasingly transparent and well documented representation of memory is a better use of time right now.

Yes, but that is not the same as saying that Int64 itself is a wrapper type, which it just isn’t. You’re conflating the representation in the compiler with the representation (& semantics) at runtime. In the abstract interpretation & during compilation you need to keep that information around, but you don’t store that information, unless absolutely necessary, in the final compiled artifact (this is called type erasure). We only pass type information as part of the compiled artifact when we have a type instability and thus may need to access that type information at runtime, to feed either into dispatch or back into the compiler.

So a variable that the compiler only knows as a Union{Int, Float64} has additional data attached to it to distinguish which case we’re in at runtime, but that does not mean that the wrapped Int object itself stores that information (it’s the Union that does) - the object we get when using that variable either is an Int or it isn’t - there’s no going back from the pure, passed on Int back to the Union (i.e. you’ll never get a Union out of a typeof(x)). The Int is only ever an 8 byte bag of bits.

I’d say it’s very much stored as part of the Array object, not the type. Though admittedly arrays are a bit more special, as they keep that information inside of themselves, on top of being (potentially!) being kept track of in a boxed version when a type instability occurs.

Yes, that I can very much agree with.

1 Like

I’m more or less satisfied with FastClosures.jl as a workaround, except for one little problem: const global variables are unnecessarily inserted into the let statement, even though they cause no performance problems. Here’s an example

julia> using FastClosures, MacroTools

julia> const A = 1
1

julia> MacroTools.prettify(@macroexpand @closure x -> f(x) + A)
:(let A = A
      x->f(x) + A
  end)

(I haven’t defined the function f in the code as I’m just testing the macro expansion.)

1 Like

FastClosures.jl can’t solve that issue because it’s operating with even less information than our lowering pass is. It doesn’t, and can’t, know that A is a const. Likewise, you could easily have f be a non-const and this would also screw up.

Note also that FastClosures.jl undos the lexical scoping, so there are entire classes of uses for closures where they’re just not applicable.

2 Likes

Apologies if someone brought this up already (I didn’t read most of this thread as it’s incredibly repetitive), but I just want to point out that there is active work happening on Opaque Closures in the compiler which in some (but not all) circumstances can solve these sorts of performance problems.

Consider the following two closure generators:

using Base.Experimental: @opaque

function abmult(r::Int)
    if r < 0
        r = -r
    end
    x -> x * r
end;

function abmult_opaque(r::Int)
    if r < 0
        r = -r
    end
    @opaque x -> x * r
end;

let r = Ref(1), x = Ref(2)
    @btime abmult($r[])($x[])
    @btime abmult_opaque($r[])($x[])
end;
26.285 ns (1 allocation: 16 bytes)
2.159 ns (0 allocations: 0 bytes)

This is because rather than being some callable struct storing data, an opaque closure is really just a slice of the program that the compiler is able to stitch back together as if there was never a closure in the first place.

However, this comes (at least currently) with a major cost, which is that opaque closures will incur a penalty if they cross out of the compilation context they were created in:

let r = 1, x = Ref(2)
    f1 = abmult(r)
    f2 = abmult_opaque(r)
    @btime $f1($x[])
    @btime $f2($x[])
end;
12.742 ns (0 allocations: 0 bytes)
38.992 ns (1 allocation: 48 bytes)

I have no idea if these things will ever be a general replacement for closures in julia, but I thought it was interesting and worth bringing up.

2 Likes

To summarize, the result of the previous discussion:

was this:

We hadn’t brought up opaque closures yet. :wink: I’m still learning about opaque closures; the documentation on them seems rather sparse, with vague claims of compiler optimization. Is their sole purpose to avoid boxed captures?

Gloating about a very limited rhetorical victory seems a little unnecessary. I think @Sukera’s point is just that sure, in that specific example you could avoid the box by putting in a specific path that detects that specific limited class of problems, but you’ll end up playing whack-a-mole and it won’t address the general problem, which really requires type inference to do properly and is a big task to fix.

You can read about their motivation here: https://github.com/JuliaLang/julia/pull/37849

7 Likes

Thanks for your response. It very much fits with my growing understanding of this area. Sorry if I bogged the conversation down with semantics since it seems we’re circling the same sentiment here.

1 Like

Sorry, my intent was to not to gloat but to summarize. A lot of unnecessary and tangential noise had been inserted into the discussion, which is likely why you hadn’t read most of it.

That’s their heretofore unfalsified point, yes. In the future, I intend to argue that this is an uncalled-for diminution of the size of the set of problems Idea 1 solves. But I will leave that for later.

Indeed, that is what Ideas 2 and 3 are about, which we will discuss further in the future.

It’s worth noting that the whack-a-mole strategy has been quite successful in reducing the impact of this issue thus far.

4 Likes

The key is that in general, proving that a re-assignment does not occur during the “lifetime” of a closure is tantamount to solving the halting problem. Idea 1 is exactly the sort of work that has been pursued by Jeff and others throughout the long history of #15276. There’s no limit to the amount of compute you dump into identifying this… and it’s very easy to walk yourself into abhorrently bad O(n^2) (or worse) compiler performance when just defining very large functions.

10 Likes