New scope solution

That’s why, in addition to GUI debugger, I mentioned SoftGlobalScope.jl , which should take care of the needs of Jupyter users.

I was under the impression that the need for SoftGlobalScope.jl was considered problematic, and that the real goal was to have the same semantics everywhere. Special casing is bad, etc.

I do this too, but after the first could of times I got hit by the scope issue, I learned to just add “global.” Sure, it’s an extra step, but I find it actually makes me more conscious of where my variables are coming from.

I also come to like the current behavior of loops for the reasons outlined by @StefanKarpinski here

Maybe we can crowdfund the GUI debugger feature if that will accelerate its arrival. I’m sure it will have popular support as I can see from this issue that there is a high demand for it.

julia> t = 0
0

julia> for i = 1:100
           t += i
       end

julia> t
5050

This is is, if I understand correctly, the case where the change is greatest. (The case where “write-only” t remains local is the current behavior, and the “ambiguous” case throws an error – with an improved message.)

This particular case might be logged with a similar “polite” warning, encouraging to declare the variable as global if that is the actual intention. Such warning would also help the inadvertent user to become aware of the possible different behavior between top-level, functions, etc.

That’s what we tried to do in v0.6 and prior. The rule is that an outer variable is inherited if one exists. So in global scope that equates to asking whether a certain global has been defined yet. In practice that made top-level code very state-dependent; you don’t know what a toplevel expression does unless you know which globals have been set elsewhere. So we wanted to move to a rule where you could work out the scope of every variable in a toplevel expression by looking at the expression by itself.

13 Likes

I think a warning is reasonable, although the impact on unit tests (which are frequently written at the global scope) should be considered. The main thing I disagree with is suggesting that they declare the variable as global. It is almost never what someone really wants to do, and it then prevents them from copying the code inside of functions (where they belong). I think the only people who should be using global and local are those who know what they are doing. Telling introductory users to put the code in a block (e.g. a function, let, or whatever) seems better general advice.

I want to underline this. There are a lot of people here talking about this as if the ≤ 0.6 behavior was perfect and everyone wishes that we could go back to it. That’s certainly a valid point of view–after all, there’s a reason we chose the behavior the way we did originally—but it’s not one that is universally shared at all. There are many issues with the old behavior. There are several different desirable criteria for scoping rules that do not seem to all be satisfiable at the same time:

  1. simple and consistent rules that are easy to explain
  2. code meaning does not depend on mutable global state
  3. top-level behavior is similar to behavior in functions
  4. for loops have their own scope

The ≤ 0.6 behavior satisfied #3 and #4 but failed at #1 and #2. This wasn’t just a theoretical problem—there were lots of issues and complaints about the old behavior. Here are just a few:

Every time we explained why it worked the way it did, the explanation was met with skepticism and people telling us that “Julia’s scoping rules are far too complicated and very hard to teach.” (And often implicitly or explicitly everyone’s favorite existential threat: “This language will fail unless you change the scope rules.”) You could very easily run some code in the REPL, have it work the first time and then run it again and have it fail every subsequent time with the only recourse being to restart your REPL session. So although many people are now talking about the ≤ 0.6 behavior as if everything was perfect, it was not.

The 1.0 behavior on the other hand, is simple, consistent and easy to explain—it satisfies all of the criteria besides #3 beautifully. But apparently, people find it so unintuitive because of the failure to satisfy #3 that it’s a show-stopper for teaching. So yeah, existential threat territory again. (Can you see why we just love it when people make these existential threat kinds of comments?)

One of the other changes that could be made is to mess around with #4. Python does this: loops don’t introduce scope. That seems a bit extreme—everyone seems to like that in Julia you don’t have to worry about loop variable names clobbering things. And it matches comprehensions where it would be even worse if the “loop variables” leaked out of the comprehension. Perhaps something else could be done where loops introduce scope but it’s a different kind of scope where the loop variables are automatically local but the scope is porous to assignment inside of the loop body. However, I for one like being able to define a local variable in a loop body and not have it litter the rest of my function even though I don’t need or want it later. And this kind of thing would bring us back to back into the ≤ 0.6 “complex and hard to explain” territory, so that doesn’t seem ideal. Moreover, it’s complex and hard to explain everywhere not just in global scope. The problems we’re encountering are all about loops in global scope, so there’s something appropriate about the complexity only affecting loops in global scope rather than making local scope pay the price as well.

Which brings us to the behavior in Jeff’s new PR as proposed in this thread. It sacrifices a bit of #1, but in my opinion less so than the ≤ 0.6 behavior did: the rule here is that in a top-level for loop, if the first use of an unannotated variable is a read then it’s global, otherwise it’s a local. That’s a pretty simple, easy-to-explain rule. But people apparently just couldn’t wrap their heads around “if the a global by that name is already defined, then assignment updates the global, otherwise it creates a new local.” So who knows? It satisfies #2 perfectly: the meaning of code no longer depends on any global state—if you evaluate the same sequence of expressions in the REPL multiple times it means the same thing every time. It half satisfies #3 as follows… Expressions like

for i = 1:n
    t += i
end

always work the same in functions and the REPL. Other expression like

for i = 1:n
    t = 100
end

either work the same or don’t: in a function, whether t is local to the loop body depends on whether t exists as a local variable outside of the loop; in global scope it doesn’t matter whether a global t exists or not, the t is always local to the loop.

So the ≤ 0.6 behavior satisfied 2/4 of the desirable criteria (#3 + #4), the 1.0 behavior satisfies 3/4 of the desirable criteria (#1 + #2 + #4), and the new behavior satisfies a slightly different 3/4 of the desirable criteria (½#1 + #2 + ½#3 + #4). It still seems better to me on the whole than the ≤ 0.6 behavior, and if, as it seems many people do, it’s super important to be able to write for i = 1:n; t += i; end in global scope and have it work, then it’s also better than the 1.0 behavior. But of course, it all depends on how you value the various criteria. Do you think simplicity is the most important thing? Then you probably think the 1.0 behavior is the best. Do you think that behavior in the REPL and function bodies matching as closely as possible is the most important thing? Then you probably think that ≤ 0.6 behavior is the best. Do you think that having statically predictable behavior that doesn’t depend on global state is the most important but also want accumulation in global scope to work? Then you probably think that the new PR behavior is the best. Personally, over time with more and more experience with language design, my appreciation for static predictability has increased markedly. But I also don’t want to answer questions from people being confused by accumulation not working in the REPL for the rest of my life. So the PR behavior is a pretty good choice from my perspective: it’s statically predictable and accumulation works. People can’t assign to globals from loops without using the global keyword :man_shrugging:. Anyone who is going to try to argue that the old ≤ 0.6 behavior is ideal and anything else is just a stop-gap has to justify why criterion #3 is so incredibly much more important than all other considerations.

39 Likes

An approach somewhat orthogonal to the loop scope behavior would be to deal with the REPL use by permitting the user to enter local scopes there, without losing interactivity. That is, being able to start a let statement – or using some other, equivalent mechanism to push a new environment on the stack – and then getting a new prompt (which might indicate the change, perhaps a nesting level). Then if one wanted to test out things in a way that mirrored the behavior inside a function, one could do that by not executing them in the global scope to begin with.

Wrapping stuff in let has been my recommendation when prototyping stuff in a script; I guess doing the same (more or less, at least) interactively could have similar benefits, perhaps.

2 Likes

From scoping issues, part 1 · Issue #423 · JuliaLang/julia · GitHub it appears that the old behavior didn’t fulfill #3 very well either. As you said, there were plenty of reasons to revisit it.

Consistent and simple, yes. But easy to explain is objective, and depends on who you are trying to explain it too. I would say it is sufficiently counter-intuitive that people who don’t have a careful mental model of scoping in programming languages can’t understand the nuances of local and global scopes. They are likely to simply rely on heuristics to avoid it (i.e. only use loops inside functions and in jupyter, and be aboid copying code from stackoverflow/discourse that has the word global or local in it).

I think you guys may have found the sweet spot. Also, for the “simple rules to explain it” you should also consider the frequency with which you would actually need to explain it and to whom. With your new proposed rules, common usage is sufficiently intuitive that I would guess that beginners rarely run into the case where they need to realize that the global and within function scope are different. By the point they run into weirdness with it, they should already be relatively advanced and can handle the answer.

Yes, this is also an important point. Even though the ≤ 0.6 rules made the REPL and function bodies more similar, it’s impossible to make them exactly the same. So it’s not like 0.6 was perfect with regards to that anyway, it was just a little harder to find cases where the behaviors deviated.

That’s kind of my general feeling as well. People seem to do this all the time:

t = 0
for i = 1:n 
    t += i
end

So having that “just work” is pretty important. Assigning to a global from inside a loop but not reading from the previous value of the global seems a lot weirder and more rare. By the time someone encounters that, one can just tell them this rule: if the first use of a variable in a top-level scope is a read then the variable is global, otherwise it’s local; in this case you’re only assigning, so it’s local. End of story. Nobody expects that the REPL and function bodies behave exactly the same because they don’t and can’t and no one has ever told them the lie that it’s possible.

3 Likes

The complaints seem to have been mainly people who were confused by a variable being in local scope when they were expecting it to be visible in global/surrounding scope (ala Python). “Solving” this problem by putting even more variables into local scope has not surprisingly led to lots more confusion and complaints.

2 Likes

Local scope is the broccoli of programming. Kids, local scope is good for you!

23 Likes

One probably-not-so-rare example of this I can think of where this would be an issue is setting some sort of “flag” value. I think the simple addition of a warning when shadowing variables in a top-level for scope would fix this problem stright-away. e.g.:

foundprime = false
start = 4; end = 10
for i in start:end
    if isprime(i)
        foundprime = true
    end
end
# => Warning: local usage of `foundprime` within top-level for loop shadows global variable `foundprime`.
1 Like

Anything that warns might as well be an error since you can’t use it without adding an explicit local annotation.

This code has not been tested, has it?

1 Like

We could test your “lemma”:

julia> if get(ENV, "debug", "0") != "0"
         macro check(a) :(if !$(esc(a)) throw(ErrorException("ups!")) end) end
       else
         macro check(a) end
       end

       t = 4
       for i in 1:2
         @check ispow2(t)  # I want to check only in debug phase
         t = 16
       end
       t
4

now we could define env variable ENV["debug"]="1" and repeat same sequence.

And also delicious when prepared well… :laughing:

3 Likes

Doesn’t Python have 1-3? So the same score as 1.0?

But I :heart: :broccoli:

(More seriously, of course, the score would be different if the weight is not uniform.)

What is your point?