Documenting when the closure "bug" (#15276) binds, and how to avoid it for introductory users

Plenty of discussion has occured about approaches to solving performance of captured variables in closures · Issue #15276 · JuliaLang/julia · GitHub and examples of it rearing its ugly head, but I haven’t seen a succinct description of when it occurs and workarounds to fix that code.

My immediate concern is if/when the following pattern of code is caught by #15276, and hence I would be teaching students very bad coding practices…

function wrapper_algorithm()
#Bunch of calculations
x = #...construct a vector
nt = (x=x,) #and some named tuples
f(y) = x + y #function of variable and things in the function.
g(z) = z + nt.x  #function of variable and things in the function.
#use f and g closure
end


The reason this is such a nice pattern is that in lectures, jupyter notebooks, etc. we could show just the inside of the function, where x, nt, etc. are globals but tell people that they should ALWAYS wrap this exact code in a funciton when performance is critical.

#Bunch of calculations
x = #...construct a vector
nt = (x,) #and some named tuples
f(y) = x + y #function of variable and things in the function.
g(z) = z + nt.x  #function of variable and things in the function.
#use f and g closure


It is a very clean way to provide sample code, but it is a bad idea if it leads to systemic performance issues.

More generally: Making the assumption that it will not be fixed in the next 6 months or so, my question is how should we be teaching introductory users to organize code to avoid the bug. If there is a description of when it does/does not occur, I haven’t seen it and really have no sense of when it isn’t an issue. A few points:

• https://docs.julialang.org/en/latest/manual/performance-tips/#man-performance-captured-1 does a good job of explaining why this is a tough problem for compilers, but doesn’t tell me when it is safe or unsafe to use closures. The description in that section also seems to focus on functions that return a closure… does this happen when you just use a closure inside of a function?
• I have also heard people say that if you put the offending arrays (or whatever) in a structure (named tuple as well?) that inference works again…
• Should the advice to beginniners right now just be “don’t use closures or comprehensions in performance sensitive code”, for example? If the advice is nuanced, maybe we could collect the details and put it in the performance section of the docs.

For introductory courses, I would consider talking about this bug a distraction. YMMV, but for a single semester course that does not focus on Julia per se, but just uses Julia to introduce some field (eg dynamic programming, Bayesian econometrics, etc), I would ignore much more basic stuff than this.

4 Likes

The existence of a performance bug and bad coding practices are completely orthogonal ideas.

5 Likes

There are very specific cases where this would matter. Typically where you would close over a variable, the compiler fails to optimize it and then in the same scope do a heavy calculation involving that variable. For a beginners course, I suggest just forget about this issue and teach how to write nice legible code.

If the course was on “optimization of julia code” this would be an interesting topic to bring up though. But nothing more as a "sometimes this happens, profile to see if it matters and use @code_warntype to see if you have Boxes there.

4 Likes

100% agreed. From my teaching experience the only case in practice when closure bug can actually hurt a newbie is when they are making a heavy use of comprehensions (which I would call nice legible code) as it is non-obvious that a closure is created this way.
Other cases when you encounter closure bug (like the one given in the JuliaManual) will most probably not happen anyway, as a newbie will not write such a code.

1 Like

Really? If this performance bug is going to be around for a while, I would encourage coding practices to avoid it if I was teaching someone to use Julia for research. In economics codes there are usually going to be a lot of closures and optimization or root-finding calls. I agree with the original post that I am often unsure when this bug is going to occur. I may opt for the 0.4 style of coding in the future and manually construct my own callable structs, just to make sure there is no performance trap.

For me, the introductory users are PhD students who might otherwise learn Fortran, so it is helpful to assure them that they will achieve comparable performance without having to debug issues like this.

3 Likes

It appears that the number of cases where this bug appears is decreasing - https://github.com/JuliaLang/julia/pull/28170

1 Like

This, unfortunately, is common with bugs; not all of them are predictable. In general, I think it is sensible to

1. write clean code that is easy to understand,
2. if speed is a concern (let’s face it — sometimes it isn’t), profile and optimize bottlenecks, potentially at the expense of 1,
3. in case some language features or bugs prevent optimal performance, work around them.

There is a trade-off between 1 and 2–3, and since it is in general hard to predict performance bottlenecks without a lot of experience, coding around potential issues preemptively is probably premature optimization (and thus \sqrt{\text{all evil}}).

Also, as @dawbarton noted, the compiler improves almost every day. Teaching how to code around a particular bug is probably not the best use of time, one could teach something more fundamental and useful instead.

2 Likes

Perhaps this previous #dev thread would be helpful for folks that want to dig in deeper as to why and how this occurs:

In short, it is a predictable syntactic transform (albeit with complicated rules). You don’t need any types or previous definitions (beyond macros) to see if it’ll occur. Just check Meta.@lower for Boxes. I’d love to find some time to spend with linting — this is highly lintable and could be flagged by an editor.

IMO: Closures can be a challenging topic to teach. Focus on the behavior first. This performance snag is just going to further confuse things unless you’re very careful.

1 Like

That’s not what we are talking about. We’re specifically not talking about research. We’re talking about teaching. In this class he’s probably not even going to go into what “code lowering”, “LLVM passes”, etc. mean. It doesn’t even make sense to talk about “type-stability”. You’re just teaching the algorithms and their use… so why is a very specific code lowering issue which causes boxing in closures suddenly a crucial topic to teach in an economics course? No.

But I mean, if you are teaching a course on software development with a significant portion spent on performance or how compilers work, sure this is a great topic to spend time on.

For research, sure… if you’re discussing performance. Most research isn’t talking about software performance though.

1 Like

That should be the case, but is not in Julia as it stands. The issue is that there can be multiple orders of magnitude difference in speed with boxed vs. unboxed code, and if it is easy to make mistakes and tough to detect the issues, it is best to stay away from patterns which lead to boxing. This is why I am opposed to telling people to use elaborate parameterized structures for parameters, as it is too easy to make mistakes with abstract containers.

Many people in economics are going to take sample code and repeat the coding patterns nearly exactly for their scripts. And I am using the word script intentionally. What they expect is that when they write their script, and use all of the fancy libraries I tell them about, that the code should be fast.

I find code with closures to be extremely readable and natural, and would love to base a great deal of code based on them. But not if there is a 50/50 chance the code will end up boxed, and I can’t tell people simple rules for how to avoid it.

I don’t want to talk about this bug at all. I want to give people patterns for code that are likely to avoid it. Basically: I want to avoid teaching people about optimization and performance tweaking by giving them patterns in the code to copy where those things are likely to be avoided.

I still have no idea when the issue occurs… Can anyone tell me? Does it generally only happens when you return a closure from another function, then that is an easy programming pattern to avoid (and not something common for simple user code).

1 Like

Closures are very natural for example scripting code and you don’t even need to tell them what it is… they find the code intuitive in matlab, for example. Lets say I am teaching people about interpolation, then in the lecture notes I give them this code

using Interpolations, Plots, Roots
alpha = 2
f(x) = x^alpha
x_grid = 0.0:0.1:1.0
f_int = LinearInterpolation(x_grid, f.(x_grid))
display_grid = 0.0:0.01:1.0
plot(display_grid , f.(display_grid ))
plot!(display_grid , f_int .(display_grid ))
b= 2
g(x,y) = 1 - x^b + y
find_zero(x-> g(x, 2.0), 0.0)


Of course, in Jupyter this is using globals, but that is fine for teaching. In fact, the only performance thing I want to beat into them is that you never run performance sensitive code without wrapping it in a function. That is pervasive enough in Julia that we can expect anyone to follow it. So then they follow my rule when they copy the pattern and do the following

using Interpolations, Plots
function my_function()
alpha = 2
f(x) = x^alpha
x_grid = 0.0:0.1:1.0
f_int = LinearInterpolation(x_grid, f.(x_grid))
display_grid = 0.0:0.01:1.0
plot(display_grid , f.(display_grid ))
plot!(display_grid , f_int .(display_grid ))
b= 2
g(x,y) = 1 - x^b + y
find_zero(x-> g(x, 2.0), 0.0)
end
my_function()


Note that f and g are closures now, and the anonymous x-> g(x, 2.0) is also a closure. This couldn’t be more intuitive and nobody needs to learn about closures, boxing, premature optimization, code lowering, etc.

UNLESS: #15276 could apply to the above? In which case I have taught people something that could drop performance by orders of magnitude (i.e. I don’t care about a factor of 2-3X) which they will likely copy all over the place.

By the very nature of tricky optimizations, it’s hard to give really good rules of thumb for when an optimization will or won’t work. Fortunately, the problem is likely to go away in the near future, so fretting too much about it is a bit undue—it will get fixed by a combination of dominance analysis and escape analysis, both of which we’re well poised to do much better with the new optimizer. Have some patience.

In general, independent of this bug, the advice to give to students is, if their code is too slow, then:

1. Profile to see where the time is going
2. Use @code_warntype on whatever’s slow
3. Fix it if it’s obvious, otherwise ask for help.

I don’t think it’s reasonable to expect that one’s mental model can 100% accurately predict what is or isn’t going to be fast in any language, although Julia is already considerably easier than most high-level languages (unless you count the prediction that everything is going to be slow in some languages—hey, at least it’s predictable). If they have an actual performance problem that hits this bug then they’ll learn about it when it happens and they’ll have learned something much more important as well: how to profile code and figure out what’s making it slow. They’ll need that skill long after this issue has been resolved.

8 Likes

FWIW, as someone who has almost exclusively written julia code for the last three years and done quite a lot of performance sensitive things (Travis CI - Test and Deploy Your Code with Confidence, GitHub - KristofferC/NearestNeighbors.jl: High performance nearest neighbor data structures and algorithms for Julia., https://github.com/KristofferC/JuAFEM.jl) I do not think about this issue when I code.

When I know a part is performance sensitive or it sticks out in profiling I will @code_warntype it, and then eventual Boxes are screaming at you anyway they are usually fixed by adding a let block at somewhere.

7 Likes

It occurs when the surface syntax is too complicated for the parser to prove that a captured binding is not modified. What makes something “too complicated” is a moving target as more and more smarts are put into the parser.

That’s why the let x=x workaround works — it simplifies the scope of the re-bound x such that Julia can prove that it doesn’t need to worry about the behavior of x, allowing it to treat it as a constant.

Note that your my_function above is Box-free.

1 Like

OK, this is a little different than my understanding reading the more technical descriptions of how to solve the problem in other threads - which involved changes in the language to signal type-stability. So it sounds like this is an issue that will become more and more rare over the next year? It is mostly about filling in compiler optimization cases?

For many types of users, I agree in principle - and the last thing I want to do is tell people to worry about performance prematurely. The issue with Julia is that most students have no prior on how fast the code should really be (and not even a prior on orders of magnitudes) and @code_warntype is far too complicated to teach most users to interpret. I can’t really read it myself except in toy examples.

This I disagree with from a practical standpoint for many users. Most users coming from matlab scripting are never going to profile, will be perplexed that @code_warntype is necessary or how they could possibly read its output, and will have a hard enough time learning how to use @btime properly. They will copy simple programming patterns you give them and accept the performance that those patterns have with little tweaking. If we can avoid teaching them patterns that make inferrence difficult for a given version of Julia, then it makes it more likely they will be happy with the performance they get. Besides the “shoot in the foot” issues like using globals or structures with abstract container types, I want them to know that for the best performance they should use pacakges written by people who know about that stuff.

But if the advice is that “closures and comprehensions performance is not systematically bad, and the cases where it has issues are sufficiently rare that you don’t need to avoid any particular programming patterns using them” then that may be good.

Great to hear. I think that is a good test for the sort of closure code that users might have.

The core devs have always said that this was an optimization problem and that it will get fixed in time. The only caveat is that the time will not be immediately because it’s a non-trivial optimization to implement. It has already gotten better and will continue to do so. It was other people who proposed changing the language to try to limit the problem—in a way that didn’t, as it turned out in that thread, actually solve the performance issue in a satisfactory manner.

2 Likes

If they are at that level, then they likely have bigger performance issues to care about. The algorithm matters more than these details.

Yes, that’s the advice. In fact, I would say that this bug shows up sufficiently rare enough that most people don’t even need to know about it existence. The only way I know how to easily trigger it is by using multithreading with eachindex over non-Array AbstractArray types…

Perhaps part of this is stemming from your advice to blindly copy global-scope code into function bodies. Yes, that is how you’ll get the best repeated performance, and avoiding global scope is often very good advice — especially when optimizing. If, however, you’re only doing it once and there aren’t any loops, then I’d guess it’ll be a wash or not worth the extra effort. It may even be a little slower. For example, my_function takes 10 seconds to define and run the first time, whereas just doing it at global scope is 12 seconds. Subsequent runs are 5ms for the function and 50ms in global scope. Working at the global scope is totally fine for things like my_function where you’re just calling out to other functions like plotting and such.

The best advice you can give is to compose functions into orthogonal and independent pieces that don’t rely upon global state. That has the side-effect of simplifying the function bodies, making boxes for any closures even less likely.

2 Likes