Tips to cope with scoping rules of top level `for` loops etc

The issue of the scoping rules of top-level for loops (and their kind) revives from time to time, and it seems that this will continue happening for a while; so I thought that it might be useful to share some thoughts and positive experiences about this in the context of explaining Julia to new users. Hopefully these tips may help others to prevent that this particular feature of Julia (even as it is now, without the “first-hit” rule) becomes a barrier for evangelizing about Julia.

Tip #1: Wrap everything in functions

I know, I know; this has been suggested and contested over and over again. It’s the most effective solution to avoid issues with the global scope, and also good programming practice; but this doesn’t go well with quick and dirty experiments, trying out code snippets interactively, and that kind things that are typically done when one learns a new programming language…

… Or it didn’t go well with those things until a couple of months ago. The new debugger works like a charm, and I can tell by experience that some newcomers love to play with it, specially in the nice interface that Juno provides: put breakpoints here and there, stop now and then continue, peek into the functions you are calling… That’s even more fun than the good old copying and pasting code into the REPL - The only frequent complaint is having to write Juno.@enter to go into debugging. :slightly_smiling_face: (I acknowledge that the people I work with are mostly from Matlab background, so perhaps this is a biased experience.)

Thanks to this, wrapping your code in functions becomes more attractive even for the initial experiments, since that is the most convenient way to debug it.

Tip #2: introduce for loops with mutable objects

This is the other obvious way to avoid hitting the global scope issue. The typical problem is with:

x = 1
for i = 1:10 
    x += i
end

This doesn’t work in the top level, but of course this does:

x = ones(10)
for i = 1:10
    x[i] += i
end

In your first steps with Julia perhaps you want to try the exciting experience of writing out explicit loops and running them fast as light (again, biased opinion from Matlab users who usually had to strive with bizarre vectorized code to avoid the slowdown of loops in that language). You may want to do this even before writing functions, so the tip #1 does not help here.

But often loops have the purpose of filling out arrays or similar objects that you can mutate without having to replace them, thus working around the global vs. local scope issue. This is a convenient way of “hiding” the problem, at least when you can control the examples you use to show how Julia works to your peers or to students.

Tip #3: “loops are like functions”

You may be lucky and hide the issue for some time with the two previous tips, but your unfortunate user will eventually hit it, so you should prepare him or her before the problem is encountered. Excellent explanations about scopes, etc. are not allowed: your colleague or student doesn’t want to learn about that, only wants code that works, and some shallow understanding that doesn’t take more than a couple of minutes to learn.

But the two previous tips may also come in handy to prepare the simple explanation that beginners need about this issue. You can show how much faster is the code that is encapsulated in (good-written) functions, thanks to the just-in-time compilation; evaluating the slowdown introduced by the debug mode may also help to understand it.

And if the user - specially if accustomed to the slow loops of other languages - wonders about the speed of such loops in Julia, you can explain that this is also thanks to the just-in-time compilation of the code that lives inside loops.

Now, I know that this statement is not accurate, but for the sake of simplicity you can tell that Julia treats the code inside the loops as it does with the code of functions, and therefore the relationship between the variables inside and outside the loop are like those that are seen in functions.

It is not necessary to go deep into those relationships. Matlab, Python and other languages force you to qualify variables as global if they are defined outside the functions but used inside them - and this is learnt by people without any pain, although their understanding about scopes may be only partial or inaccurate. So the only extra bit that has to be learnt for Julia is that this has to be done too - but not only in functions, but also in loops - unless the loop itself is inside a function, as seen before.

In summary, my suggestion is that instead of inviting people to develop a new mental model to deal with the scoping rules of Julia, we use their current mental model to help them learn the practices that should be followed in this language, using some motivating features that make the small changes attractive for users.

8 Likes

Tip 4:

Recommend new users play with jupyter notebooks first (with soft global scope), until they get some experience.

Tip 5:

Don’t hide the global scope issue by only using mutables in examples. Tell people straight away that the global-scope-loop behavior is a quirk that also bites experienced users all the time. People can adapt to quirks and maybe eventually appreciate why they make sense, but get annoyed very quickly if they feel gaslit (“the global loop-scoping is totally normal and intuitive, it is you who are crazy for expecting something else”).

5 Likes

I would not recommend this, because I think that working with the soft global scope is the wrong kind of experience, and would lead to problems later on precisely for the people who need help understanding scope.

FWIW, my advice boils down to a single rule: code anything nontrivial in functions.

This has many implicit advantages: better compiler optimizations, more idiomatic code, not having to worry about the whole global scope problem, easier unit testing, and possibly others I forgot.

There is no advantage in not doing this, so this is a habit that new users should internalize from day 1.

Julia does have global variables, but since they come with a host of other issues, they are better reserved for the cases where they are really needed, such as

  1. maintaining global state,
  2. storing values in interactive use.

Using a global variable in a for loop and similar is something the language technically allows, but has very few use cases. Trying to sugar-coat this issue to new users may be well-intentioned, but I think it is doing them a disservice in the long run.

10 Likes

I agree wholeheartedly on this. I love having the global very explicit when you really want global variables. This is one of the reasons I hate the current solution: you are annotating something as a global that you don’t really want as a global outside of the current script.

The core of this issue is that in an interactive/script style programming, global variables are the wrong mental model for variables inside of a loop at the top level. Instead, the mental model of matlab and the scripting languages is that the variables are local to the script or the (or jupyter notebook) itself. Another mental model from C would be that there is a big void main() function around the script, and the variables are local to that main and not truly global unless you annotate them as such. This is the reason that people get confused with the current v1.0 behavior, and expect top level code to be equivalent to wrapping the whole script in a function and call it. They don’t think of the variables as being globals.

I think the best mapping of the mental model to julia’s scope would be that when they write a script, we should think of the whole thing being wrapped in a big let block.

Of course, the main() model breaks down in implementation. Actually having variables local to a script (or jupyter notebook) or the REPL doesn’t work in a dynamic and interactive language, which is why it is necessary to make them be global as an implementation detail for practical reasons. I don’t think there is a way to reconcile this without some special cases.

If you have been using “real” programming languages for a long time, and haven’t programmed in a scripting style recently, then this distinction may seem artificial, but it is extremely intuitive. Similarly, if you are largely working on writing packages, this distinction is irrelevant since the only globals you would have are intentional globals. The proof that this is intuitive is that SoftGlobalScope (which effectively emulates what I am talking about) and the old Julia v0.6 behavior was never confusing to introductory users. Leave global for when people intentionally want a global variable.

1 Like

@stevengj Did I capture the issue in the mental model correctly? I think this is the core of the problem, and the reason the two camps are having trouble reconciling their worldviews.

1 Like

I am not sure what other scripting languages you are referring to (please be more specific, it could help focus the discussion), but AFAIK Matlab didn’t even have namespaces prior to 2008.

Also, I am not a Matlab expert, but I am not aware of a distinction between what you call “script local” and global.

I am not sure about this. Hard/soft scope was one of the most confusing things about Julia and led to plenty of discussions. Of course, you can say that by the time people understood the issue, they were no longer considered newbies :wink:

If you look back at the discussions and questions, the confusion was almost invariably variables that were unexpectedly local, not about variables that were unexpectedly global. Making more variables local has made that problem worse.

(Even for experienced Julia programmers who understand scope, it’s a continual annoyance to add global keywords for interactive code.)

I still think that the least-bad non-breaking solution at this point would be to default to soft scope for interactive contexts. I haven’t seen a single complaint about IJulia doing this for nearly a year now.

8 Likes

Here’s the thing: intentionally or not, you are working with global variables if you hit this problem. If you want a local-scope-y script, you can just throw a let/end around it. And I’d even be curious how a notebook would feel if each cell was itself a let block. It’d totally wreck most folks’ workflows — including my own — but I wonder how it’d feel to explicitly annotate the global effects and if it’d help prevent some of the more confusing aspects of possible nonlinear cell execution. :slight_smile:

:wave: I strongly dislike it, but it’s mostly because it was a unilateral move. As someone who teaches Julia itself (and not other subjects) through Jupyter notebooks, I dislike that the scope behavior students see is not the scope behavior they get at the REPL and elsewhere. I’ve also seen a handful of posts here and on SO asking why something works in Jupyter notebooks but not in the REPL/Juno/script.

1 Like

Sure, but always complaining about it not working in the REPL. None of those posts were complaining that IJulia should reject their code too—soft scoping is the intuitive behavior.

As for it being unilateral, that’s certainly true. If I’d waited for consensus we’d still be here arguing about it, and meanwhile I need to teach non-programming classes using Julia where explaining the concept of scoping in the middle of a math lecture is impractical. If you’re teaching a programming class, in contrast, most of the code is probably in functions anyway, and you can always turn the soft scoping off. In retrospect I think I made the right choice.

(I don’t think having different scoping rules for interactive code vs. scripts is good, just that it is our least-bad non-breaking option.)

10 Likes

This is good advice for software development, but interactive coding and exploration (where performance is not a concern!) has a place, too.

And if you write all of your code in functions, why do you care about the scoping rules for slow global code? It doesn’t affect you at all. Do we just want to punish people who don’t care about performance and want to play around interactively? Julia should be inconvenient for such users, even if it comes with no benefit for serious software development that doesn’t use global scope much?

4 Likes

Sure, and I get that. It’s kinda like how stopping to explain the concept of IJulia.SOFTSCOPE[] = false gets in the way of teaching what I actually want to teach. :wink:

That’s unfair — there are benefits to the 1.0 rules. It makes it obvious to both humans and static analyzers which identifiers are global. Heck, the Rebugger depends upon it.

2 Likes

Namespaces are largely orthogonal to globals for this sort of scoping, so we can talk about essentially any version of matlab.

It turns out my mental model of matlab is the actual model. Scripts have their own scope. Here is a description of how it works for matlab, which exactly captures my point: the mental model of users of scripts having their own scope is how it is implemented: How can I use global variables and MATLAB workspaces? - MATLAB Answers - MATLAB Central

Now, this implementation itself has its own drawbacks and is not something I am directly suggesting, but it is extremely intuitive for users. If they ever want to share a variable between scripts or non-closure functions, they know they have to define it as a global. It is a very intentional decision, just like it would be in C or compiled languages. i.e. if you ever see a global in matlab, it should smell.

Yes, that is right! By the time people ran into behavior that was confusing in the old v0.6 regime, they were already hooked on julia and could understand the subtlety of the answer. I never saw a discussion on hard/soft scope in Julia from someone who didn’t know what they were talking about.

The reason we are pushing on this again is to resolve that inconsistency. The question is which behavior fit each student’s prior of how scoping should work.

Let me turn it on its head, show me one example of a language supporting scripting style (i.e. top level file with commands not encased in a main() or function to introduce scope, or code in a jupyter notebook) which forces you to mentally think of variables in loops at the top level as being global.

Any example where global is used to annotate a variable inside of a loop inside of a top level file will do. You are going to have to dig deep on this one…

No, you have it wrong. The mental model people have is that the jupyter notebook as a whole has a scope, not each cell. notebook = script = a scope. This is how it is in python, R, matlab, etc. as well. This is also why @stevengj had to hack on top of the whole of IJulia, and why it can’t just introduce a let at a cell level.

That is an implementation detail in Julia, not something essential to how scripting works. As I say above, in matlab the distinction between global and script-level scope is explicit.

I just don’t think you have made the mental shift for why the globals are different there. It is worth trying to understand the perspective before you fight against it so forcefully. Julia is forcing us to use a global instead of a script level scope. That is fine, and it may be the best way to implement it, but intuitively they are used in very different ways.

How about complaints from anyone who hasn’t submitted a PR to julia’s repo, as this feature isn’t for them and I am not sure they have a good sense of how people use a scripting language. If the complaint is “I don’t think Julia should support interactive scripting” then that is fine, but it is a different discussion.

I have no skin in the game from the anger created from unilateral moves (a move I, and many others, strongly supported at the time, even if we weren’t able to pull the trigger ourselves) so maybe we can focus on how best to support current and future users.

Please don’t put words in my mouth. I’m not angry at unilateral moves. I’m not forcefully fighting against you. And I’m certainly not against Julia as an interactive scripting language. I know how scripting workflows work, I’ve used Matlab for over a decade, and I’m aware of the prevailing mental models of Jupyter notebooks. I don’t even think I’ve been very strong in my posts here — I simply wanted to point out the other side of the coin.

3 Likes

OK, my apologies for misunderstanding.

For sure. I think a lot of people here, myself included from my old C++ days, appreciate the value in not having an exception for interactive use at the top level. You have all done a good job of pointing out the other side of the coin, and for non-interactive/exploratory use-cases, your position is certainly correct.

It sounds to me, though, that if we took @stevengj 's suggestion to make the softscope an option at the script level (and possibly the REPL, though I care less about that) the side effects would be (1) experts would have more difficulty reasoning about scope in their code at the top level (as opposed to inside of functions, which are not effected); (2) static analyzers operating at the top level of scripts (as opposed to inside of functions, which are not effected) would have more difficulty, or might not even be supportable; and (3) Rebugger in its current form is a casualty.

Have I missed key downsides here? In return, we make the experience of new users (and those journeyman like myself who switch between script and function style) much better, and avoid the current state where users run into (non-intuitive to them) scope issues in the first 5 minutes of using the software.

Is there any way to trigger a discussion to move things forward, as it currently seems like all progress has stalled otherwise. Is having an option, where the default of on/off could be discussed, for softscope on the REPL and script level technically difficult to implement? It seems like the least work. Furthermore, the exact softscope implementation used in IJulia has been used for a year without a single issue that I know of.

That was precisely the original intention of this thread, but taking the scoping rules as they are right now, rather than giving ground for yet another debate about whether they should be changed or not. :angel:

1 Like

Rebugger works fine with IJulia, as I understand it, because the softscope transformation doesn’t touch macro contents.

It looks like there is a chance that let’s scoping rule will not be changed (hence not breaking Rebugger): New scope solution - #223 by jeff.bezanson

Speaking of let, how about “wrap for in let together with initialization” as another tip?

While I recognize that occasionally people attribute malicious intentions to those who just disagree with them, from you it is somewhat surprising. I hope that the tone if this discussion recovers from this.

No, I don’t want to “punish” anyone. I just think that even from a pedagogical point of view, sugar-coating the issue for newbies harms them in the long run. Scripting is fine if one mostly does assignments, but nontrivial control flow (which is the most common case of running into the issue, eg loops) belongs in functions. Otherwise, this inevitably leads to “why is my code so slow” questions.

I recognize that you would like to use Julia as a scripting language in teaching where what you call “software development” (which looks like any kind of structured approach beyond trivial scripts) is outside the scope of the course. I understand that it would be convenient for you to do this as you are an expert in Julia. I just don’t think that Julia is well-suited for this, and that we should bend it in this direction.

1 Like

Sorry, I didn’t mean to make you sound malicious.

When you say you don’t want to “sugar-coat the issue” because “nontrivial control flow … belongs in functions,” it sounds like you want working in global scope to be hard as an end in itself. I fundamentally disagree with this viewpoint — we shouldn’t make sacrifices in Julia usability unless it has a clear benefit for production code, which is not the case here (“serious” code in functions is not affected by the scoping rules for global loops).

(I don’t quite understand why you are so passionate about a language rule that, from the sound of it, is irrelevant to the code that you write.)

If we want to make it harder to write slow code in Julia, there are lots of things we could do. For example, we could require a special annotation for type-unstable functions. I don’t think we want to go there.

2 Likes

I think there is a misunderstanding. Doing experiments with created or imported data at REPL is not ‘using Julia as a scripting language’ - actually it’s an interactive way to work. btw: you’ll find “We want it interactive and we want it compiled.” in the " Why We Created Julia".

I work in REPL mostly in matlab, sometimes in python and moving back to an edit-compile-run workstyle would be (yes) moving back in time and experience. Along that i also disagree that fixed length loops (for loops) are a non-trival control flow. In matlab all that implicit indexing stuff (find / logical indices) enable to work without it (when your data has reactangular shape) - while in julia the method of choice seems to be explicit loops.

If julia is not suited to do serious work at REPL, then -well- i’ll move on.

3 Likes