Another possible solution to the global scope debacle

I’m not sure there are any similar patterns that benefit from making more variables local inside let . Given that in general we default to overwriting variables in outer scopes, it can’t be all that important for let to be special here.

Well, doesn’t this count as an example? In particular, Rebugger cannot work in 0.6 or 0.7 (because of the scope deprecation) but it does in 1.0.

3 Likes

How about (as @Liso suggested above) also allowing the global keyword in front of let and for, in the same way as local. For example:

global let x=1
  y = x
end

would use a local scope for x and a global scope for y.

These rules would always hold:

  • local for is equivalent to a repeated local let
  • global for is equivalent to a repeated global let

Maybe one could also provide convenience “packages” using GlobalFor, using LocalLet, etc that, when usinged would cause every unannotated for or let in a module (or at the REPL) to become annotated.

So far, this would be a non-breaking change, so it could be introduced in Julia 1.1

The next step is then to decide whether the unannotated for expands to local for or global for and the same question for let. These two questions would not have to be answered until 1.2, when people have had a chance to annotate their fors and lets, and thus avoid getting a warning.

2 Likes

I agree. It may be the case that having different behavior in the REPL vs. files is actually justified; the requirements are just very different.

No, because you run the function one statement at a time, in separate inputs. If you have to put the whole function in a let block it’s no different than just calling the function.

4 Likes

I understand that’s a useful pattern (I do use it). But using let up to the statement you want to see the result seems to be a valid pattern too.

@tim.holy was also mentioning using let in debugging session in the above comment.

The combination of being able to edit it and see what happens, and then hit the up arrow and edit it in a different way, means that it is operationally more similar to running the function line-by-line, with the bonus that it doesn’t break your REPL. If the bug you’re investigating is triggered in a project that imports ~500 packages at once, this ability is important. (Less so in cases where the cost of restarting isn’t so high.)

5 Likes

My background is mostly in architecture and software engineering of mission-critical large-scale software systems, such as satellite/missile control, high energy physics experiments automation, country-scale energy generation and distribution, eCommerce (eBay) etc.

I don’t recall seeing such energetic debates about the scoping since the days when Algol-68 was still the thing. Even though, discussions of mid-1990th about C++ scoping come close to that. It indicates to me that Julia is, in some respects, a revolutionary language.

Revolution resolves a tension. Julia creators were not shy about exactly what that tension was: need to prototype rapidly vs execute quickly sophisticated numeric algorithms. The ongoing discussion reflects yet unresolved vestiges of that tension.

I think the remaining tension in Julia 1.0 language design is that between the already huge practical range of its real-life use cases and the still too optimized for a significantly narrower range of use cases semantics of the language.

The syntactic equality of the declaration of a variable combined with its initialization and of the reassignment of the variable value is optimized for rapid prototyping. This equality creates a semantic tension when combined with syntactic and semantic elements targeting quick execution and modularity - mandatory traits of large software systems.

The traditional way out is syntactically separating the notions of declaration, initialization, and reassignment. Then, a variable simply belongs to one of the statically known scopes where it was either declared, initialized, or reassigned.

Old school performant large-scale languages, such as C, postulate that a variable’s scope of existence is its declaration scope. Modern languages, such as Swift, require that a variable must be also initialized in its declaration scope.

Julia at this point stays with the tradition of prototyping languages, with its most concise variation of the syntax bundling together the declaration, initialization, and reassignment. This is perfectly fine for short “scripts”, yet becomes problematic within nested and modular constructs characteristic of larger code bases.

My personal preference would be introduction of explicit yet very concise Declaration && Initialization syntax, keeping the current Reassignment syntax as is, and gradually deprecating the undifferentiated (Declaration && Initialization) || Reassignment semantics.

I’m not knowledgeable about the Julia parser. Perhaps the experts could come up with a simple one-keystroke or two-keystrokes syntactic extension that would mark the difference? Could be in front of a variable being declared, separated from it by a space (e.g. "` "), or a part of the Declare+Assign token (e.g. “:=”).

5 Likes

found the further discussion a bit late. My thoughts on the subject:
The proposal is an improvement on the current v1 scoping rules. These create problems that need to be solved, and not only because of “lazy teachers”: To recapitulate the problems:
The current rules are unexpected and un-intuitif to most people, not to only complete newbees in a course. Even experienced programmers (in other languages) that know about scope, and are looking into Julia testing some stuff in the REPL are bitten by it. (as I have experienced myself, and that was even at time I was following the discussion about it). You see this by the number of people signaling problems (or submitting bugs…) related to it. We can be sure that even more people are having problems with it, but just giving up on Julia instead of asking about it, making it a serious problem for wider Julia adoption.
While I understand that it was done for better consistency, it is less consistent in other ways (behaviour inside and outside a function): You cannot execute the same code in a function and in the REPL, making it a lot harder to test/debug stuff by copy/pasting code (from functions) in the REPL.
Also, you cannot simply wrap code made in the REPL/script into a function.

Although I must admit I do not really like the idea of creating more globals, making everything global outside of a function will solve the problem with less breakage than my preferred alternative (making everything local: each file is its on scope, globals must always be defined). I can actually get this already easily (except for the REPL) by always wrapping everything in functions.

For me personally, fixing the REPL would be enough (I would be using functions anywhere else anyway). But again, too many people just running the code they made/tested in the REPL as a simple script without wrapping it in a function will be unpleasantly surprised if it does not work as expected, so that should be covered as well, and you might as well do it globally then (more files, modules, etc. are more advanced users anyway).

I think the change must be made as soon as possible.
I would not expect this change to actually break much/any code: The few people that know/understand the current behaviour should not be running much code outside functions, and the newbee/coming from other languages crowd is more likely to be happy that the “bug” is fixed than talking. Treating at it as a (design) bug for which the fix may break backward compatibility for a very limited number of cases would make this process faster, easier and with less fuss …

5 Likes

Some things I really like about the proposal to make global and local macros, or syntax keywords, applied to for, and let (and heck while we are at it maybe even function, though that would make writing bad code easier)

  1. we can make clear statements like:

for loops come in two flavors, local for and global for, if you write for on its own it defaults to local for

  1. when/if we make the breaking change (whether in 1.3 or 2.0) we can describe it as:

In Julia X.Y, outside of functions for loops defaults to global for,
but inside of functions it defaults to local for.

  1. people not wanting to explain about scope can just introduce it as global for, only talk about using that. Yes, it begs the question “What does the global mean?”, but that is ok, it naturally spring boards into a later discussion about scope (which maybe just “Don’t worry about it for this class, if you want to know more …”)

  2. During deprecation periods it is a ton nicer to demand that all loops be annotated with local or global than it is to ask for all assignments to be.

6 Likes

I see most people talking about for-loops only. But the same obviously applies to while-loops, only that there is no obvious notion of “loop variable”. E.g.

julia> x=2;
julia> while x>0
       x -= 1
       end;
ERROR: UndefVarError: x not defined

This is, imho, even more counterintuitive than for-loops, where it is intuitively obvious to beginners that some scoping happens with the for loop_var = 1:10.

I think more informative error messages are in order for 1.0.2 (where the behavior will definitely not change otherwise).


Now, when changing to default global scope in the REPL, I would fear that it is very easy to inadvertently name-clash:

julia> sin(4)
-0.7568024953079282

julia> for i=1:1
       global sin = 1
       end
ERROR: cannot assign variable Base.sin from module Main

vs

julia> for i=1:1
       global sin = 1
       end

julia> sin(4)
ERROR: MethodError: objects of type Int64 are not callable

In short, keeping some kind of scope-isolation for interactive use would be nice.


In practice, I see three kinds of top-level loops:

  1. Module code that uses macros to define a lot of functions or otherwise do initialization. Here, the worst-case scenario is leaking / overwriting globals, and the 1.0 rules are good.
  2. Interactive use. Here, the worst case are counter-intuitive errors turning off new users that just want 10 lines of code to run. I think a lot of people are using ijulia anyway?
  3. Scripts that are supposed to be run from the command line. This is the hard one: People who make stuff work in REPL / ijulia and then paste their commands into a file would be pretty dismayed if it doesn’t run anymore once run as a file.

A simple heuristic for separating (3) and (1) might be to keep 1.0 hard scoping inside modules. Yes, then modules get more complicated, but people who start writing modules that require top-level loops for initialization / function definitions are precisely the people who can take some extra complexity.

4 Likes

So, to turn the above ramblings into a proposal:

  1. More informative error strings get backported into 1.0. This can be something simple, like ERROR: UndefVarError: x not defined. See "https://docs.julialang.org/scoping".

  2. In Julia 1.1, a new command-line flag and environment variable like “TOPLEVEL_HARDSCOPE” is introduced. Same, a new meta:@hardscope begin end and @globscope begin end for overriding the flags. Precedence is that innermost > outer > command-line flag > environment variable.

  3. In Julia 1.1 and later, code inside of modules obeys the 1.0 scoping rules.

  4. If command-line flag or environment variable or meta is set, then REPL and file-execution outside of modules also obeys the 1.0 scoping rules.

  5. Otherwise, REPL and file-execution outside of modules use the new scoping rule proposed by Jeff: Top-level for and while loops default to global.

That way, we can hopefully get the best of both worlds: Modules get their nice hygiene, and full backward compatibility. REPL-users get the more beginner-friendly global-by-default scope for top-level loops, and recorded REPL-sessions execute correctly when stored to a file. People sitting on lots of 1.0 scripts and using 1.1 can set the environment variable to mitigate the breaking change. People doing lots of out-of-module file-includes (bad, bad pattern) can use the new meta to mix scripts expecting 1.0 and 1.1 conventions. Possibly, the scoping-rule change macros get backported to 1.0 (but not exported).

PS. Maybe this is a bit too much complexity to preserve backwards compatibility with really terrible rube-goldberg scripts. But the general module vs non-module distinction makes a certain amount of sense, with respect to differing needs, and preserves backwards compatibility for sane code (within modules / functions).

2 Likes

I think this is a really good suggestion that

  1. can be done quickly until (if?) the semantics changes,
  2. immediately released as v1.0.x or similar without breaking anything whatsoever,
  3. the user experience and feedback could then guide further discussion.
5 Likes

It’s a pity that we missed 1.0.1 for this, and am doubtful that the error string would warrant an earlier 1.0.2 release (otoh, there really are a lot of people who get confused).

So, we would need to write a page in the docs / FAQ that explains the scoping rules in top-level loops, with beginner-friendly examples, and would need to modify the error string (but not the exception type) to contain a link to the relevant doc-page.

If we want to go fancy, then the error string could distinguish between top-level scope and others. But I think such a distinction is a bad idea, and slightly more verbose error-strings containing possibly unneeded hints are not too bad. I’m unsure what to do about ijulia (maybe the new FAQ-page should also mention the differences to ijulia, since there are a lot of notebook users).

There isn’t even a PR up for improving the error messages so talking about missing 1.0.1 and earlier release of 1.0.2 seems a bit odd.

3 Likes

Also, IMO this alone would be worth making a minor release for.

3 Likes

Fair enough. That is easy to rectify.

1 Like

A different view on my previous scope issue resolution proposal.

First, the necessary background. The major differences between a script, a program, and a software system:

  • Script is typically designed, written, and maintained by a single person. It is often written in a programming language that doesn’t have a one-to-one mapping from syntax to underlying semantics expressed in computer science terms: the mapping is usually bundled and blurred, to make the language more “natural” and concise. Script usually has a single context. Script author often expects a variable with a given name to have the same identity throughout the script. In linguistic terms, it is a mono-context conversational instruction given by a person to a machine.

  • Program is usually designed at a high level by one person. It can be developed by multiple people. It is typically maintained by multiple people. The people involved almost always have some training in Computer Science. It usually has multiple contexts, delineated by functional, type, and other boundaries. In linguistic terms, it is a multi-context formal written instruction created by a group of people speaking same language.

  • Software System is often designed at a high level by multiple people, working at different times, using different conceptual frameworks. The high level designers, usually called Architects, tend to have advanced training in Computer Science and Software Engineering. Software System is almost universally developed and maintained by multiple people. It typically has multiple mutually incoherent contexts: semantically same things can be named differently in different contexts, and semantically different things can be named the same. In linguistic terms, it is a multi-context formal written instruction created by a group of people speaking multiple languages.

Julia 1.0 violates expectations of Script writers, forcing them to conceptually ascend not only to the level of Program writing, but further to the level of System writing. In linguistic terms, analog of the infamous Julia 1.0 redefinition of a variable inside a loop would be:

“John developed into a strong young man by age 16”.
“Ever since, John has played as a school varsity quarterback”.

Naturally, it trips the Script writers when they find out that Julia 1.0 treats the word “John” in the two statements above as references to two different people. They expect that this conversational context only contains one person named “John”. Overcoming this innate expectation requires a conceptual leap to the System thinking, where it is not unusual to have two semantically different things be called the same name. I think that such conceptual ascendance is too much to ask of Script writers.

In natural language, there are markers indicating that the context of conversation has changed. For instance:

“There is another strong boy named John, who has played as a school varsity quarterback”.

In programing languages, the role of such marker is played by an explicit variable declaration. My proposal is to introduce this to Julia, in additive way conducive to gradual migration, yet in a very very concise form, to keep Julia friendly to script writers.

3 Likes

@Balance: Your proposal seems like a fairly fundamental shift in language semantics. It reminds me of the discussion here about introducing new language constructs for immutability. (It also raises the idea of using :=). It might be worth reading through that to see what some of the designers of Julia have to say about that.

I am mostly a spectator, but I suspect such a change is unlikely to be considered by the core devs, in part because (IIUC) it would break so much existing code.

2 Likes

It occurs to me that one solution to the issue of “can we make scope simple for scripting without destroying packages like Rebugger?” is to use a function for the body of what’s currently the let body: that is, instead of

let x, y = X, Y
    body
end

we do

let x, y = X, Y
    function _letbody287(x, y) # really a gensym
        body
    end
    _letbody287(x, y)
end

We could even create a package LocalLet and have this be

@llet begin x, y = X, Y
    body
end
6 Likes

I think @StefanKarpinski answered exhaustively the questions about the immutability in the thread @haberdashPI referred to:

Julia already distinguishes assignment syntactically from both mutation and equality. There are the following assignment-like syntaxes:

  • x = ... is assignment. The name x is bound to the value that the ... evaluates to. Local bindings never leak out of their scope. It doesn’t matter what x was bound to before this happens, that value is not affected in any way. No object is mutated by this and no other binding besides x is changed.
  • x.f = ... is equivalent to setproperty!(x, :f, ...) which, by default mutates the object x by changing its field f to the value of the expression ... . If x is visible in another scope or by another bindings, this change will be seen everywhere. No bindings are changed by this.
  • x[i] = ... is equivalent to setindex!(x, ..., i) which, for arrays mutates the array x by changing its i th slot to refer to the value of the expression ... . If x is visible in another scope or by another bindings, this change will be seen everywhere. No bindings are changed by this.

There are equality operators == and === which check for value-based equality and identity-based equality, respectively. There is also already a syntax for creating a constant binding as opposed to a variable binding:

const x = ...

Indeed, this is very clear in relation to mutability/immutability, and in my experience this was practically sufficient for Julia code I’ve written so far. A good balance between expressiveness and conciseness IMHO. I dig @StefanKarpinski’s stance on this.

However, Julia 1.0 broke my code in more subtle ways than I anticipated. Upon deeper analysis, I realized that the issues appear to stem from the Julia’s dissymmetry between assignment and mutation.

Let’s get back to @StefanKarpinski’s examples:

  • When I see x.f = … or x[i] = … equivalent in pretty much every statically compiled language, including Julia, it tells me that binding between name x and a value of a mutable type is expected to already exist in an accessible scope before the code analysis gets to this line. A concrete binding is found by the name resolution algorithm, potentially including a binding in the current scope. If such binding isn’t found, compiler reports an error.

  • When I see x = … equivalent in a statically compiled language, the semantics, potentially, could be one of:

A. Symmetrical to the treatment of x.f = … or x[i] = …: a binding between x and some value is already expected to exist in an accessible scope, in which case the statement is interpreted as re-assignment of a new value to that binding. If a binding doesn’t yet exist, compiler reports an error. Creation of a binding is supposed to be explicit, happens in the current scope, and can be very syntactically similar to the assignment. This is how compilers for most statically compiled languages tuned for large-scale programming operate.

B. Same as above, except if binding doesn’t yet exist in an accessible scope, then the statement creates a new binding in the current scope. As I understand, this is how Julia behaved prior to 1.0, at least in regard to a current scope accessible inside a for loop.

C. Binding between x and some value may already exist in the current scope, in which case the statement re-assigns a value to that binding. If it does not yet exist in the current scope, then the statement creates it. This is how Julia 1.0 behaves in regard to a current scope accessible inside a for loop.

This discussion thread exists because significant number of people are perplexed by the defaulting to the choice C for the x = …. I also understand why defaulting to B wasn’t 100% desirable either.

Doesn’t it leave A as the logical default choice for the next stage of Julia’s evolution? Naturally, any move in this direction ought to be backward-compatible.

2 Likes

For what it’s worth, one of the more confusing things is that the name resolution is within the scope and a syntactic, not semantic thing. By this, I mean:

julia> x=1;

julia> function f()
       @show x
       false && (x=1)
       nothing
       end
julia> function g()
       @show x
       #false && (x=1)
       nothing
       end
julia> f()
ERROR: UndefVarError: x not defined
julia> g()
x = 1

In principle, I would prefer a splitting into UndefGlobalVarError (no local binding of that name exists, and at runtime there is no global of that name) and UnInitializedVarError (a binding for this symbol exists in the scope, but has not yet been assigned at time of use – all control flows leading to a use before assignment must give us this error, but this may need to be tracked at runtime due to @goto, if the halting problem for the specific user code is too hard for the compiler).

1 Like