Another possible solution to the global scope debacle

Some things I really like about the proposal to make global and local macros, or syntax keywords, applied to for, and let (and heck while we are at it maybe even function, though that would make writing bad code easier)

  1. we can make clear statements like:

for loops come in two flavors, local for and global for, if you write for on its own it defaults to local for

  1. when/if we make the breaking change (whether in 1.3 or 2.0) we can describe it as:

In Julia X.Y, outside of functions for loops defaults to global for,
but inside of functions it defaults to local for.

  1. people not wanting to explain about scope can just introduce it as global for, only talk about using that. Yes, it begs the question “What does the global mean?”, but that is ok, it naturally spring boards into a later discussion about scope (which maybe just “Don’t worry about it for this class, if you want to know more …”)

  2. During deprecation periods it is a ton nicer to demand that all loops be annotated with local or global than it is to ask for all assignments to be.

6 Likes

I see most people talking about for-loops only. But the same obviously applies to while-loops, only that there is no obvious notion of “loop variable”. E.g.

julia> x=2;
julia> while x>0
       x -= 1
       end;
ERROR: UndefVarError: x not defined

This is, imho, even more counterintuitive than for-loops, where it is intuitively obvious to beginners that some scoping happens with the for loop_var = 1:10.

I think more informative error messages are in order for 1.0.2 (where the behavior will definitely not change otherwise).


Now, when changing to default global scope in the REPL, I would fear that it is very easy to inadvertently name-clash:

julia> sin(4)
-0.7568024953079282

julia> for i=1:1
       global sin = 1
       end
ERROR: cannot assign variable Base.sin from module Main

vs

julia> for i=1:1
       global sin = 1
       end

julia> sin(4)
ERROR: MethodError: objects of type Int64 are not callable

In short, keeping some kind of scope-isolation for interactive use would be nice.


In practice, I see three kinds of top-level loops:

  1. Module code that uses macros to define a lot of functions or otherwise do initialization. Here, the worst-case scenario is leaking / overwriting globals, and the 1.0 rules are good.
  2. Interactive use. Here, the worst case are counter-intuitive errors turning off new users that just want 10 lines of code to run. I think a lot of people are using ijulia anyway?
  3. Scripts that are supposed to be run from the command line. This is the hard one: People who make stuff work in REPL / ijulia and then paste their commands into a file would be pretty dismayed if it doesn’t run anymore once run as a file.

A simple heuristic for separating (3) and (1) might be to keep 1.0 hard scoping inside modules. Yes, then modules get more complicated, but people who start writing modules that require top-level loops for initialization / function definitions are precisely the people who can take some extra complexity.

4 Likes

So, to turn the above ramblings into a proposal:

  1. More informative error strings get backported into 1.0. This can be something simple, like ERROR: UndefVarError: x not defined. See "https://docs.julialang.org/scoping".

  2. In Julia 1.1, a new command-line flag and environment variable like “TOPLEVEL_HARDSCOPE” is introduced. Same, a new meta: @hardscope begin end and @globscope begin end for overriding the flags. Precedence is that innermost > outer > command-line flag > environment variable.

  3. In Julia 1.1 and later, code inside of modules obeys the 1.0 scoping rules.

  4. If command-line flag or environment variable or meta is set, then REPL and file-execution outside of modules also obeys the 1.0 scoping rules.

  5. Otherwise, REPL and file-execution outside of modules use the new scoping rule proposed by Jeff: Top-level for and while loops default to global.

That way, we can hopefully get the best of both worlds: Modules get their nice hygiene, and full backward compatibility. REPL-users get the more beginner-friendly global-by-default scope for top-level loops, and recorded REPL-sessions execute correctly when stored to a file. People sitting on lots of 1.0 scripts and using 1.1 can set the environment variable to mitigate the breaking change. People doing lots of out-of-module file-includes (bad, bad pattern) can use the new meta to mix scripts expecting 1.0 and 1.1 conventions. Possibly, the scoping-rule change macros get backported to 1.0 (but not exported).

PS. Maybe this is a bit too much complexity to preserve backwards compatibility with really terrible rube-goldberg scripts. But the general module vs non-module distinction makes a certain amount of sense, with respect to differing needs, and preserves backwards compatibility for sane code (within modules / functions).

2 Likes

I think this is a really good suggestion that

  1. can be done quickly until (if?) the semantics changes,
  2. immediately released as v1.0.x or similar without breaking anything whatsoever,
  3. the user experience and feedback could then guide further discussion.
5 Likes

It’s a pity that we missed 1.0.1 for this, and am doubtful that the error string would warrant an earlier 1.0.2 release (otoh, there really are a lot of people who get confused).

So, we would need to write a page in the docs / FAQ that explains the scoping rules in top-level loops, with beginner-friendly examples, and would need to modify the error string (but not the exception type) to contain a link to the relevant doc-page.

If we want to go fancy, then the error string could distinguish between top-level scope and others. But I think such a distinction is a bad idea, and slightly more verbose error-strings containing possibly unneeded hints are not too bad. I’m unsure what to do about ijulia (maybe the new FAQ-page should also mention the differences to ijulia, since there are a lot of notebook users).

There isn’t even a PR up for improving the error messages so talking about missing 1.0.1 and earlier release of 1.0.2 seems a bit odd.

3 Likes

Also, IMO this alone would be worth making a minor release for.

3 Likes

Fair enough. That is easy to rectify.

1 Like

A different view on my previous scope issue resolution proposal.

First, the necessary background. The major differences between a script, a program, and a software system:

  • Script is typically designed, written, and maintained by a single person. It is often written in a programming language that doesn’t have a one-to-one mapping from syntax to underlying semantics expressed in computer science terms: the mapping is usually bundled and blurred, to make the language more “natural” and concise. Script usually has a single context. Script author often expects a variable with a given name to have the same identity throughout the script. In linguistic terms, it is a mono-context conversational instruction given by a person to a machine.

  • Program is usually designed at a high level by one person. It can be developed by multiple people. It is typically maintained by multiple people. The people involved almost always have some training in Computer Science. It usually has multiple contexts, delineated by functional, type, and other boundaries. In linguistic terms, it is a multi-context formal written instruction created by a group of people speaking same language.

  • Software System is often designed at a high level by multiple people, working at different times, using different conceptual frameworks. The high level designers, usually called Architects, tend to have advanced training in Computer Science and Software Engineering. Software System is almost universally developed and maintained by multiple people. It typically has multiple mutually incoherent contexts: semantically same things can be named differently in different contexts, and semantically different things can be named the same. In linguistic terms, it is a multi-context formal written instruction created by a group of people speaking multiple languages.

Julia 1.0 violates expectations of Script writers, forcing them to conceptually ascend not only to the level of Program writing, but further to the level of System writing. In linguistic terms, analog of the infamous Julia 1.0 redefinition of a variable inside a loop would be:

“John developed into a strong young man by age 16”.
“Ever since, John has played as a school varsity quarterback”.

Naturally, it trips the Script writers when they find out that Julia 1.0 treats the word “John” in the two statements above as references to two different people. They expect that this conversational context only contains one person named “John”. Overcoming this innate expectation requires a conceptual leap to the System thinking, where it is not unusual to have two semantically different things be called the same name. I think that such conceptual ascendance is too much to ask of Script writers.

In natural language, there are markers indicating that the context of conversation has changed. For instance:

“There is another strong boy named John, who has played as a school varsity quarterback”.

In programing languages, the role of such marker is played by an explicit variable declaration. My proposal is to introduce this to Julia, in additive way conducive to gradual migration, yet in a very very concise form, to keep Julia friendly to script writers.

3 Likes

@Balance: Your proposal seems like a fairly fundamental shift in language semantics. It reminds me of the discussion here about introducing new language constructs for immutability. (It also raises the idea of using :=). It might be worth reading through that to see what some of the designers of Julia have to say about that.

I am mostly a spectator, but I suspect such a change is unlikely to be considered by the core devs, in part because (IIUC) it would break so much existing code.

2 Likes

It occurs to me that one solution to the issue of “can we make scope simple for scripting without destroying packages like Rebugger?” is to use a function for the body of what’s currently the let body: that is, instead of

let x, y = X, Y
    body
end

we do

let x, y = X, Y
    function _letbody287(x, y) # really a gensym
        body
    end
    _letbody287(x, y)
end

We could even create a package LocalLet and have this be

@llet begin x, y = X, Y
    body
end
6 Likes

I think @StefanKarpinski answered exhaustively the questions about the immutability in the thread @haberdashPI referred to:

Julia already distinguishes assignment syntactically from both mutation and equality. There are the following assignment-like syntaxes:

  • x = ... is assignment. The name x is bound to the value that the ... evaluates to. Local bindings never leak out of their scope. It doesn’t matter what x was bound to before this happens, that value is not affected in any way. No object is mutated by this and no other binding besides x is changed.
  • x.f = ... is equivalent to setproperty!(x, :f, ...) which, by default mutates the object x by changing its field f to the value of the expression ... . If x is visible in another scope or by another bindings, this change will be seen everywhere. No bindings are changed by this.
  • x[i] = ... is equivalent to setindex!(x, ..., i) which, for arrays mutates the array x by changing its i th slot to refer to the value of the expression ... . If x is visible in another scope or by another bindings, this change will be seen everywhere. No bindings are changed by this.

There are equality operators == and === which check for value-based equality and identity-based equality, respectively. There is also already a syntax for creating a constant binding as opposed to a variable binding:

const x = ...

Indeed, this is very clear in relation to mutability/immutability, and in my experience this was practically sufficient for Julia code I’ve written so far. A good balance between expressiveness and conciseness IMHO. I dig @StefanKarpinski’s stance on this.

However, Julia 1.0 broke my code in more subtle ways than I anticipated. Upon deeper analysis, I realized that the issues appear to stem from the Julia’s dissymmetry between assignment and mutation.

Let’s get back to @StefanKarpinski’s examples:

  • When I see x.f = … or x[i] = … equivalent in pretty much every statically compiled language, including Julia, it tells me that binding between name x and a value of a mutable type is expected to already exist in an accessible scope before the code analysis gets to this line. A concrete binding is found by the name resolution algorithm, potentially including a binding in the current scope. If such binding isn’t found, compiler reports an error.

  • When I see x = … equivalent in a statically compiled language, the semantics, potentially, could be one of:

A. Symmetrical to the treatment of x.f = … or x[i] = …: a binding between x and some value is already expected to exist in an accessible scope, in which case the statement is interpreted as re-assignment of a new value to that binding. If a binding doesn’t yet exist, compiler reports an error. Creation of a binding is supposed to be explicit, happens in the current scope, and can be very syntactically similar to the assignment. This is how compilers for most statically compiled languages tuned for large-scale programming operate.

B. Same as above, except if binding doesn’t yet exist in an accessible scope, then the statement creates a new binding in the current scope. As I understand, this is how Julia behaved prior to 1.0, at least in regard to a current scope accessible inside a for loop.

C. Binding between x and some value may already exist in the current scope, in which case the statement re-assigns a value to that binding. If it does not yet exist in the current scope, then the statement creates it. This is how Julia 1.0 behaves in regard to a current scope accessible inside a for loop.

This discussion thread exists because significant number of people are perplexed by the defaulting to the choice C for the x = …. I also understand why defaulting to B wasn’t 100% desirable either.

Doesn’t it leave A as the logical default choice for the next stage of Julia’s evolution? Naturally, any move in this direction ought to be backward-compatible.

2 Likes

For what it’s worth, one of the more confusing things is that the name resolution is within the scope and a syntactic, not semantic thing. By this, I mean:

julia> x=1;

julia> function f()
       @show x
       false && (x=1)
       nothing
       end
julia> function g()
       @show x
       #false && (x=1)
       nothing
       end
julia> f()
ERROR: UndefVarError: x not defined
julia> g()
x = 1

In principle, I would prefer a splitting into UndefGlobalVarError (no local binding of that name exists, and at runtime there is no global of that name) and UnInitializedVarError (a binding for this symbol exists in the scope, but has not yet been assigned at time of use – all control flows leading to a use before assignment must give us this error, but this may need to be tracked at runtime due to @goto, if the halting problem for the specific user code is too hard for the compiler).

1 Like

FWIW, I really like @jeff.bezanson’s proposal, along with introducing local for and global for (and let), which defaults to local for now, with the option of changing the default later (whether in 1.3 or 2.0) after some experience has been gained, as advocated by @Liso and @oxinabox. Apart from being quite clean, this strategy should also alleviate @tim.holy’s concern. Importantly, code will continue to work the same in scripts and at the REPL; changing that seems worse than the original issue to me.

For teaching, any discussion about scope can thus be postponed by just introducing the global for concept at first; perhaps this could be simplified further by introducing for! as syntactic sugar as proposed earlier (though that begs the question of what happens to naked for if/when the default is changed down the road) In addition, perhaps a warning could be issued whenever a local variable is introduced that shadows a global, in addition to a more helpful error message as proposed in #29585, silenceable by macro or environment variable. That might take much of the sting out of Example 1 of the original issue. For Example 2, it would technically even be non-breaking (as it currently errors) if total_lines were initialized to the value of the global in addition to the warning, making the example work as intended.

With these measures, we might even find that the flood of discourse and stackoverflow questions subsides to a point where changing the default may not even be warranted anymore, or could at least be postponed until 2.0, hence honoring semver.

2 Likes

Thinking a bit more about this, for! i = ... could additionally be made to imply for outer i, and this could be how it’s explained in a teaching context, hence avoiding any reference to scope: just say it means that “variables can get overwritten by the loop”. If/when the default is changed to global for later, for! i will simply continue to imply outer. This might be simple enough for non-CS students to grasp. @stevengj?

3 Likes

I am not sure why one would jump through hoops just to avoid mentioning scope (briefly, of course).

Maybe I was lucky to teach some exceptionally bright students, but usually even students without prior programming experience (who were nevertheless interested in programming, which is why they took the course) grokked scope rather quickly. After all, it is a very intuitive concept, and someone taking courses at or above a BA/BSc level in STEM or data-based social sciences must encounter abstractions that are much more difficult. Teaching even basic programming without scope is like teaching linear algebra without bases.

I rarely saw for loops that accumulate in global scope in the wild in Julia, even for v0.6, because it is bad style.

If it is not something we would do because it is not good practice, why would we teach it? Isn’t this a disservice to the students?

PS: Sorry @s-broda, of course not all of this is in reply to your comments. I just find the “teaching” argument not super-convincing.

4 Likes

It’s perfectly good style for interactive exploration, where you usually don’t write functions. Not all coding is about performance or reusability.

15 Likes

As far as I am concerned, the idea is not to avoid introducing scope, but to postpone the discussion to a more appropriate time than literally the first 20 minutes of the first lecture (which may not even be on programming; in my case it’s finance). It’s easy to forget how hard these concepts are for students from non-STEM backgrounds.

In any case, just from the number of people running into this issue, it is clear that there is an issue here, presumably not just for students but also people who may not enjoy the luxury of a carefully designed course. So I think it’s best to keep the thread focused on the merits of the proposed solutions.

1 Like

I am not so sure about this, since we don’t know the base (to be fair, the this can go both ways, since people could encounter the issue and not show up here or other forums).

Given the difficulties with the various solutions that are apparent in this thread, I think we should keep “not changing semantics, but giving helpful error messages” on the table as one of the solutions, at least in the short run. Strictly speaking of course you are right, since anything not discussing this particular solution is off-topic in this thread, but I feel that has happened already.

4 Likes

If you read my 2 posts above, my thinking was precisely that if the proposals I summarized are implemented, then we may get around having to change the semantics, by implementing appropriate warnings and errors and introducing useful syntactic sugar, viz., for! i being sugar for global for outer i. All these can be done now (in 1.0.2) At least we might buy ourselves some time to get more data; the default can always be changed from local for to global for later if still deemed necessary, and in a semver compliant way.

1 Like