Documenting when the closure "bug" (#15276) binds, and how to avoid it for introductory users

I agree with you entirely on your comments. We tell people that they only need to wrap it in a function if it “feels” slow… globals are often perfectly reasonable for simple scripts. Of course, we want to get to the point where people will decompose functions, but it is harder to get there than you would expect for certain types of users. There is no reason that Julia cannot be used by people who (asymptotically!) only intend to write simple scripts that rely on fancy algorithms in external packages.

Of course! And the algorithms are in outside packages, which I want to tell people is the key thing to use for performance. If they can use closures without fear of losing orders of magnitude in performance to define the functions that they pass to carefully crafted algorithms written by the experts, then I think my question is answered.

I find it uncomfortable that the only way one can get to know about this bug is by digging deep enough in one’s otherwise flawless code, spending hours doubting one’s understanding of basic language concepts and how type stability works, then eventually asking a question on a forum, only to get referred to the closure bug and the workarounds. Documenting the summary of the 15276 issue can hopefully minimize the frustration.

Saying that it is uncommon or that one doesn’t have to think about it much is fine but somewhere at the back of your head you know there is something called a closure bug and that it may be the reason of your suspiciously slow code, and you know the workarounds. Getting people to this point from just reading the docs seems like a fair thing to ask.

2 Likes

It is documented and linked to from the various parts in the documentation that deals with topics where this is sometimes an issue.

To be fair, the same can be said about making a small mistake and having inference bail out. There is a cost to be dynamic (and having very general fallback interfaces like AbstractArray) in that code keeps producing correct result but, due to a mistake, could be slower than optimal.

Seriously, people, what do you want? This is an acknowledged performance bug, it’s been significantly improved already, more improvements have been promised in the future, and it’s thoroughly documented. We’re terribly sorry that compilers have unpredictable performance cliffs sometimes. Nor is that limited to dynamic languages—every compiler has these kinds of issues. If you want perfectly predictable performance, use a language that’s slow all the time.

5 Likes

Speaking for myself: I just needed to know if this is: (1) an odd and unpredictable corner case that doesn’t bite in most circumstances, and is not worth worrying about for organizing code; (2) an issue that comes up reasonably often, but can be largely avoided by not using closures in an easily enumerated set of cases; or (3) sufficiently common and tough to predict that it is best to tell intro users to avoid closures in the short term. I don’t think that the “trach them to profile type stability” is the right answer.

From what you are saying, the answer is (1). There are not easy ways to enumerate the cases, and they are infrequent enough that it isn’t worth steering new users away from them. If you thought that everyone knew the answer was (1), then you were mistaken… I think that many people, myself included, assumed that it was closer to 3.

Who said/thinks it’s closer to 3? Tons and tons of Julia code use closures. It’s a very common method. Almost no code hits this bug. There’s like 13 Discourse posts (with examples repeated on the PR) over the 2 1/2 years that the bug has existed. Yes, it sucks when you hit a bug. No, you’re way over estimating the chance of hitting it.

DiffEq has probably around 100,000 (? a lot of tableaus… 500,000? I don’t know. I checked a long time ago) lines of code and we profile enough to know when we hit things like this. We hit it twice. The reason:

So sure, if you’re going to be teaching people about performance optimization for parallelization and multithreading, give it a mention along with let blocks, locking, and @code_warntype etc. since all of that of relevant to a performance discussion. But mentioning it without context is FUD: almost none of your students will hit it, so I am not sure why they should care.

4 Likes

Yes, this used to be more common — now it’s much more rare. In fact, many of the examples in the GitHub issue are fixed.

Here’s a minor tip that I find to be good style and will help keep this gremlin away: avoid re-assigning to the same binding where possible. For example, the following function will hit the bug:

function f(flag)
    x = 1
    if flag
        x = 2
    end
    return [x*i for i in 1:10]
end

If you instead write:

function f(flag)
    x = flag ? 2 : 1
    return [x*i for i in 1:10]
end

you’ll avoid this issue and I find it to be much easier to read… and it’s easier to avoid type instabilities this way, too. Importantly: my reason for preferring the latter isn’t because of this performance issue! And of course I still often re-use the same binding when it makes sense (e.g., in loops and things like x += 1 and what not).

6 Likes

I will second all those who said no, one should not avoid closures or teach people to avoid them because of this issue. The reason I would offer is that when this performance issue hits, it’s not really the closures per se that are the problem, but the variables. It can basically always be solved by putting a type declaration on a captured variable, which doesn’t require any code restructuring. That’s the key point: avoiding closures can require a significant rewrite, but that’s not necessary, hence there’s no need to teach people different code-structuring habits.

@mbauman’s last post has it right: the issue occurs for variables that are used in a “complex way”, which basically boils down to having both of (1) being assigned in multiple places, (2) being used in closures.

On a different level, I think it’s quite unfair to call out a performance issue in the context of introductory teaching. Python is systematically slower than other languages and yet everybody considers it a great language to teach. Meanwhile, hoards of undergrads have been taught in matlab that loops and recursion are evil and slow. Except now matlab has a pretty good JIT, except when it bails out for whatever reason, and does anybody understand when that happens? (Personal anecdote: I was in a matlab class myself once, and we got to the part where the instructor was telling us about how loops are slow and must be avoided. I raised my hand and asked him to actually time it, which revealed that the loop was the same speed as the built-in function. The look on his face? His world was crushed. Let this be a cautionary tale.)

18 Likes

May you point me to these? The closures page in the latest docs doesn’t seem to have it.

The performance issue associated with captured variables is mentioned in the latest docs in the sections on do-blocks, generators, and scope of variables. The main discussion on the performance issue in the performance-tips chapter.

3 Likes

Just wanted to remind people of the let block workaround and the existence of FastClosures.jl.

3 Likes

Thank you so much, this is exactly what I was hoping for. And the “avoid re-assigning the same binding” is often a good general practice independent of this particular issue.

Beautiful, thanks. This is how FUD is crushed… not everyone is privy to this information. What plebes like me see is: complicated details about the compiler in https://docs.julialang.org/en/latest/manual/performance-tips/#man-performance-captured-1, old github and stackexchange comments, discussions in discourse, comments in slack like “Good ol’ #15276.”, mentions of the “let block workaround”, the mere existence of FastClosures.jl, and apparently “latest docs in the sections on do-blocks, generators, and scope of variables”. Based on all of that, I hope you can appreciate why outsiders who don’t write Julia for a living might want a clarification on best-practices. If @mbauman had been the first to respond to this thread with this post, I don’t think we would have even needed to discuss it further. Speaking of which, can anyone tag that response as the “answer” to the question?

I think is worth pushing back on: that there is the possibility to have fundamentally different types of code in lectures/samples and “production” research code. Sounds great, but it doesn’t really work in many fields because the education stops with permutations of sample code. The vast majority of people in the field will have a total of 0 programming/software courses in their entire education. Given good enough external packages, most economics code would be around the length of a script, and most users would be indifferent to learning more than they really need about the language. Matlab’s sweet-spot is exactly in those sorts of fields.

While I would always go for expressiveness and clarity over performance (in both lectures and in my research), if you show people coding patterns that are an order of magnitude slower, they are likely to keep repeating them and variations on those examples will diffuse. Better to try to eliminate bad sources of performance (and by bad, I mean > 2-3X performance hit, not a few micro-optimizations) early, before people start copying around the code. Luckily it sounds like there are no systemic closure issues that fit that bill.

Of course, ensuring that “introductory users shouldn’t need to worry about performance” is exactly the goal. My hope is not to have to talk about performance at all! But keep in mind that to convince people to give up an incredibly productive IDE with a proper debugger in Matlab and essentially no compile time, there has to be something they see in return. Elegant code and the ability to use better algorithms due to generic programming is the reason I like Julia, but the subtlety is lost on new users. An order of magnitude faster than Matlab code at runtime and the ability to use fancy algorithms in packages without worrying about language details can be a convincing argument. The goal should be to help them to that point without having to explain to them what boxed types and @code_warntype is for.

3 Likes

It seems that a lot of divergent goals are tied up in your problem, eg

  1. language evangelism (convincing others to use Julia),
  2. teaching introductory programming in the context of a field (economics),
  3. teaching writing performant code in Julia,
  4. to users who

I am afraid that these are not fully compatible. Perhaps you should think about what your priorities are.

I think the ideal you espouse in this post is a little misguided. Julia is a language that spans from high level ease to low level performance, yes. This means that every user’s code can and should be performant? No.

The advantage of having everything in one language isn’t that everyone can program it perfectly. Software development is hard. Just like you would only expect domain experts to know the right code to use and technical computing experts to implement tough numerical algorithms, getting a code actually performant is a skill. It takes knowledge and practice. Using Julia the standard way gets you close enough, but to actually get performant code you need to inspect how the code is lowering, redesign low level structures and algorithms for cache friendliness, etc.

Look at the Celeste project as a good example. Not everyone on the Celeste project was an expert software developer. But the project worked well because of single language collaboration. Domain experts implemented the algorithms and Keno went in and modified the code to be in a high performance style (struct of arrays, multithreading, etc.). Traditionally this second step required a language rewrite. Now it just takes an expert to tweak and modify the algorithm with performance in mind. Having this in one language greatly eases this process and allows the collaboration to continue on the code (instead of the process to production code being “one way”).

So no, the students that you’re teaching will not write tip top performance code. And no, without conciously studying it will they get close. The Discourse likes to think that getting rid of allocations and type stability is all it takes: it’s not that easy. You can sometimes make code much faster by allocating and using dynamicness. You need to really benchmark, profile, and study to get there. But, domain experts who are writing “good enough” code are playing a crucial role in this process by getting the general algorithm structure down and tested. In the end, not everyone is going to play all roles well and that’s okay.

6 Likes

It’s also worth pointing out that julia, in all its greatness, is still a pre 1.0, somewhat specialized language, whose ecosystem is not yet as mature as python or Matlab. Teaching it to nonprogrammers and worrying about performance issues feels a bit premature, and this issue is a good example of why.

(also I feel that this bug comes up pretty often when writing multidimensional array manipulations with comprehensions, although I haven’t tried the latest fixes. I think the “profile, warntype and workaround” approach is fair game while this bug is still being worked on - the reassurance that it will get fixed and is not a fundamental language limitation is very helpful. Anyway, if you’re worrying about performance and not profiling your code, you’re doing it wrong.)

3 Likes

I see where you’re coming from. This is why I think it’s important not to have anything in the language that’s “fundamentally slow”, like an entire feature that needs to be avoided. The goal is for reasonable performance (to be distinguished from expert-level performance) to always be just a few small tweaks away.

Keyword arguments used to be a “slow feature”, but fortunately that has been fixed. v0.5 fixed the “slow feature” status of anonymous functions and closures. Now I think we’re basically down to just global variables as the slow thing to be avoided, which I’m pretty comfortable with since nobody considers using lots of globals to be a good idea anyway. try/catch/throw are also arguably “slow” and we should work on that eventually, but the risk of abusing them seems pretty low.

7 Likes

Tangentially related to this, but is it documented anywhere that comprehensions are implemented as closures? I remember being tripped up by that at first, wondering why a comprehension led to boxing, since it didn’t immediately occur to me that this might be related to closures.

2 Likes

I will drop my two cents now. When I use R, everything I make is a function. I only set constant globals when they work and are intended to be used like that (e.g., api_key). From a performance issue, having helper functions inside functions is pretty bad for computational efficiency so I might consider some cases where I do keep them outside.

In Julia, I love type dispatch so usually when I am writing code I will use mutable structs and define methods such as update! which handle a good share of cases in this discussion. I totally understand why the issue occurs and seems sensible from the interpreter’s perspective. Sharing scope is a blessing and in some cases can be a curse. I am actually curious how the new global and local definitions affect this issue in Julia 0.7/Julia 1.0. I think type declaration on captured variables could be a solution to for the time being until the language decides on how to best handle the issue (is not a bug if it is documented and works as intended, might be a decision or limitation).

1 Like

Thank you, this captures my perspective exactly. I am really happy that closures are not in the list of fundamentally slow. The only thing I would (personally) add to this list, is that beginners are best not given code to copy that has lots of parametric user defined types. The distinction between concrete and abstract containers and parametric types is too complicated for beginners, and if they do it slightly wrong the code will compile (as it should), but performance silently drops dramatically. I think that named tuples can provide most of the features beginners need for collecting parameters, so that teaching about defining your own parametric types and abstract containers can be delayed until they are ready

To use Chris comment as an example, but I think it applies to all sorts of examples here, I want to make one comment on Julia discourse culture. Keep in mind that the reason I asked the question was precisely to avoid talking about micro-optimizations to students, but many people assumed the opposite.

In this thread, and many others, when people ask a question there is often a barrrage of “that is not the right question to ask!” or “that is rarely necessary, what are you really trying to do?” or “stop griping, this is being worked on” or “if you organize your code correctly you don’t need a debugger” or “premature optimization is the root of all evil” . Even if these responses are frequently correct, sometimes a direct answer to questions or clarifications on the question, as asked first is more helpful for posters and people reading them later.

Lecturing people on asking the wrong question can also be helpful, but if you let there be a direct answer first, it decreases the signal to noise ratio. I hate that I had to type so much defending why I was asking the question and that people were wrongly assuming my intentions. If people had just let @mbauman answer first, and did the lecturing later, it would have helped.

1 Like