Another possible solution to the global scope debacle

Yes, I’m referring to the behaviour under the proposed solution; not Julia 1.0, which I find more consistent.

You are right.
Sorry for the misunderstanding.

As a new Julia user, I would agree that the message should be more descriptive to enable user understand what’s the issue and how to resolve rather than just a standard error message leaving the user perplexed about what to do.

5 Likes

Maybe just as a reminder (from “Why we created Julia”):

Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

This is something i can subscribe to and for me the first “simple to learn” also includes “simple to explain/teach”.

5 Likes

I’m probably not adding much to the signal:noise ratio here but I think this comment is right on the money

While I like the original proposal (thanks very much to the core devs thinking this through!), the route to get there looks quite painful with the potential to break a lot of code through the deprecation process. While it should be an easy change to fix, I think we’ve seen that a lot of people don’t make use of the available tools to automatically apply fixes (i.e., FemtoCleaner) and so it could well cause more pain than it’s worth.

While I’d very much like to have the original proposal implemented, I suspect a more fruitful/less painful approach would be to implement the error message as mentioned by @piever and also include an @global macro that mirrors the global statement in that it simply makes all assignments in the annotated code block global (unless previously annotated as local). This would be very similar to the @softscope macro but slightly simpler in form.

In this case we might have the following workflow -

julia> x = 1                                      
1                                                 
                                                  
julia> for i = 1:10                               
       x = x + 1                                  
       end                                        
ERROR: UndefVarError: x not defined; x inside the for loop refers to a local variable, use 
`global x` or `@global for` to modify global variable x.
Stacktrace:                                       
 [1] top-level scope at .\REPL[3]:2 [inlined]     
 [2] top-level scope at .\none:0                  
                                                  
julia> @global for i = 1:10                    
       x = x + 1                                  
       end                                        
                                                  
julia> x                                          
11                                                

I’d expect IJulia workbooks to continue using SoftGlobalScope (or the original proposal in this thread) but that’s fine since workbooks are a very different beast.

9 Likes

While a more informative error message would be an improvement, it doesn’t fix the problems that

  • The very first time you write an interactive loop you have to understand the distinction between local and global scopes. This will confuse and turn off a lot of potential users.
    • This is a huge problem for pedagogy in a non-CS context where you just want to use Julia interactively. Imagine trying to teach statistics and having to stop in the middle of the lecture to explain scope.
    • For a large number of new users it will make Julia seem pointlessly picky compared to any other interactive language.
  • It’s a big annoyance even for experienced users, because it makes interactive Julia code harder to write, and makes it harder to paste code in functions to/from the REPL to try things out.

The good news is that, since the problem with global scoping semantics mainly arises in interactive contexts, we can initially improve matters in a non-breaking way by implementing this only in the REPL and other opt-in contexts.

25 Likes

FWIW I much prefer the proposed behavior to the 1.0 behavior. For one, I think most commonly used languages have soft scope for loops, which makes Julia’s 1.0 behavior quite unintuitive. Secondly, I think the soft scope better fits the most natural, most common use case: set up a variable, iterate some operation on it, then do something with the result.

3 Likes

It looks like many people (including me) are happy with the new for scoping rule. But what about let?

Keeping the consistency between for and let means that forgetting a single comma would alter the program in a subtle way, right?:

julia> x = 1
       y = 2;

julia> let x = 10,
           y = 20
       end

julia> x, y
(1, 2)

julia> let x = 10  # no comma
           y = 20
       end
20

julia> x, y  # it is (1, 2) in v1.0
(1, 20)

I know this wouldn’t be the only place where a single comma is important. I’m just trying to understand the consequence.

This also breaks the natural expectation that let; ... end is equivalent to (() -> begin ... end)().

Just to add a small dose of (non-Matlab) prior art… Ruby had to deal with the question of shadowing scopes in lambda’s a while back. Their solution was to add an optional declaration that would prevent variable assignment within the lambda scope from overwriting a variable from outside the lambda scope. For example:

noshadow = ->(x) { puts "'a' is #{a}"; a = x }
shadow = ->(x; a) { puts "'a' is #{a}"; a = x }

a = 1 #=> 1
noshadow.(2) #=> 'a' is 1
a #=> 2
shadow.(3) #=> 'a' is
a #=> 2

Note that in the second case, 'a' is is displayed because a is null at the point of the print statement.

A few nice features of this:

  • if you don’t know about Ruby lambda’s introducing their own scope, then everything seems to work as expected with variables from outside the scope being captured by and modifiable from within the lambda
  • you can write “safe” lambda’s where you don’t have to worry about unintentionally affecting the calling scope
  • the shadow declaration initializes the variable, so you can reference it before first assignment (without the (...; a) part of the declaration, the puts statement would error on an undefined variable)

One thing that is very different for this Ruby case and what we’re discussing with Julia, however, is that absent the existence of a in the enclosing scope, any a = x assignment within the lambda will only declare a scope-local variable. In other words:

l = -> (x) { a = x }
l.(2)
a #=> undefined local variable (i.e. `a` within the lambda body was lambda-local)
a = 1
l.(2)
a #=> `a` within the lambda body this time referred to the `a` from the enclosing scope

Apologies if this is adding to the noise, but I think Ruby is, if nothing else, a very beginner friendly language and it might be useful to learn from it. That said, it may also be that some of its “beginner friendliness” comes from its (some might call it “egregious”) use of dynamic scoping…and that may not be a bridge we’re willing to cross.

2 Likes

That is an orthogonal concern. It’s just a property of Julia’s let syntax: once the commas stop you are inside the block and no longer introducing new let-bound variables. In general that will mean different behavior unless we make much more radical changes.

I don’t think it’s possible to have both formal properties like this, as well as “ergonomic” syntax optimized for convenience. (As a footnote, while that equivalence holds in Scheme it does not hold in ML-family languages.) Also, the equivalence would hold once you are in local scope, because in local scope all assignments already overwrite outer variables by default.

3 Likes

I’m curious to know why breaking consistency between let and function is preferable over breaking consistency between let and for. If you say that such consistency between let and function did not exist in the first place then I guess that’s the answer. But I also think creating “stronger” scope by let could be considered “ergonomic” as well since the reason why users would write let is to introduce a scope; so why not give them a stronger/safer one?

1 Like

I think the main thing is to have fewer exceptions. Making functions the lone exception is simpler. Also, functions are special in that they indicate an intent to create a reusable piece of code that therefore needs some extra isolation.

I wouldn’t say the purpose of let is to introduce a new scope. Rather the purpose is to create specific new variable bindings. For example this pattern is very useful:

let e = 2.7
    exp(x) = e^x
end

At the top level, having that define a global function exp is what we usually want, and we don’t get it currently (you have to write global).

Another reason is debugging by copying code from functions to the REPL (I confess I do that a lot). In 0.6 it fails only for functions with inner functions. In 1.0 it fails on all scoped constructs. It would be nice to at least go back to everything except inner functions working.

To me, it’s not enough just to guess that since somebody wrote let they want more things to be local. There would need to be a specific, useful code pattern that’s more elegant under that assumption. For example the pattern that kicked off this issue is initializing a variable, and then updating it in a loop. I’m not sure there are any similar patterns that benefit from making more variables local inside let. Given that in general we default to overwriting variables in outer scopes, it can’t be all that important for let to be special here.

11 Likes

Thanks, exp example is actually compelling. The usecase I had in mind was something like this at module level:

let # works in v1.0
    for b in 0:10
        exp = Symbol("exp", b)
        @eval $exp(x) = $b^x
        if b > 0
            invexp = Symbol("invexp", b)
            @eval $invexp(x) = $(inv(b))^x
        end
    end
end

It would be bad to have the temporary variables exp and invexp leaked out to the module’s top-level scope. Of course, one can use local or let in front of each temporary variables. But I’d say that’s more ugly.

Wouldn’t let; ... end being equivalent to (() -> begin ... end)() actually better for this? You can just wrap the code with a let block and then it’d work even for edge cases like the code with inner functions, right? (We can even have a keyboard shortcut for this.)

I’m not sure there are any similar patterns that benefit from making more variables local inside let . Given that in general we default to overwriting variables in outer scopes, it can’t be all that important for let to be special here.

Well, doesn’t this count as an example? In particular, Rebugger cannot work in 0.6 or 0.7 (because of the scope deprecation) but it does in 1.0.

3 Likes

How about (as @Liso suggested above) also allowing the global keyword in front of let and for, in the same way as local. For example:

global let x=1
  y = x
end

would use a local scope for x and a global scope for y.

These rules would always hold:

  • local for is equivalent to a repeated local let
  • global for is equivalent to a repeated global let

Maybe one could also provide convenience “packages” using GlobalFor, using LocalLet, etc that, when usinged would cause every unannotated for or let in a module (or at the REPL) to become annotated.

So far, this would be a non-breaking change, so it could be introduced in Julia 1.1

The next step is then to decide whether the unannotated for expands to local for or global for and the same question for let. These two questions would not have to be answered until 1.2, when people have had a chance to annotate their fors and lets, and thus avoid getting a warning.

2 Likes

I agree. It may be the case that having different behavior in the REPL vs. files is actually justified; the requirements are just very different.

No, because you run the function one statement at a time, in separate inputs. If you have to put the whole function in a let block it’s no different than just calling the function.

4 Likes

I understand that’s a useful pattern (I do use it). But using let up to the statement you want to see the result seems to be a valid pattern too.

@tim.holy was also mentioning using let in debugging session in the above comment.

The combination of being able to edit it and see what happens, and then hit the up arrow and edit it in a different way, means that it is operationally more similar to running the function line-by-line, with the bonus that it doesn’t break your REPL. If the bug you’re investigating is triggered in a project that imports ~500 packages at once, this ability is important. (Less so in cases where the cost of restarting isn’t so high.)

5 Likes

My background is mostly in architecture and software engineering of mission-critical large-scale software systems, such as satellite/missile control, high energy physics experiments automation, country-scale energy generation and distribution, eCommerce (eBay) etc.

I don’t recall seeing such energetic debates about the scoping since the days when Algol-68 was still the thing. Even though, discussions of mid-1990th about C++ scoping come close to that. It indicates to me that Julia is, in some respects, a revolutionary language.

Revolution resolves a tension. Julia creators were not shy about exactly what that tension was: need to prototype rapidly vs execute quickly sophisticated numeric algorithms. The ongoing discussion reflects yet unresolved vestiges of that tension.

I think the remaining tension in Julia 1.0 language design is that between the already huge practical range of its real-life use cases and the still too optimized for a significantly narrower range of use cases semantics of the language.

The syntactic equality of the declaration of a variable combined with its initialization and of the reassignment of the variable value is optimized for rapid prototyping. This equality creates a semantic tension when combined with syntactic and semantic elements targeting quick execution and modularity - mandatory traits of large software systems.

The traditional way out is syntactically separating the notions of declaration, initialization, and reassignment. Then, a variable simply belongs to one of the statically known scopes where it was either declared, initialized, or reassigned.

Old school performant large-scale languages, such as C, postulate that a variable’s scope of existence is its declaration scope. Modern languages, such as Swift, require that a variable must be also initialized in its declaration scope.

Julia at this point stays with the tradition of prototyping languages, with its most concise variation of the syntax bundling together the declaration, initialization, and reassignment. This is perfectly fine for short “scripts”, yet becomes problematic within nested and modular constructs characteristic of larger code bases.

My personal preference would be introduction of explicit yet very concise Declaration && Initialization syntax, keeping the current Reassignment syntax as is, and gradually deprecating the undifferentiated (Declaration && Initialization) || Reassignment semantics.

I’m not knowledgeable about the Julia parser. Perhaps the experts could come up with a simple one-keystroke or two-keystrokes syntactic extension that would mark the difference? Could be in front of a variable being declared, separated from it by a space (e.g. "` "), or a part of the Declare+Assign token (e.g. “:=”).

5 Likes

found the further discussion a bit late. My thoughts on the subject:
The proposal is an improvement on the current v1 scoping rules. These create problems that need to be solved, and not only because of “lazy teachers”: To recapitulate the problems:
The current rules are unexpected and un-intuitif to most people, not to only complete newbees in a course. Even experienced programmers (in other languages) that know about scope, and are looking into Julia testing some stuff in the REPL are bitten by it. (as I have experienced myself, and that was even at time I was following the discussion about it). You see this by the number of people signaling problems (or submitting bugs…) related to it. We can be sure that even more people are having problems with it, but just giving up on Julia instead of asking about it, making it a serious problem for wider Julia adoption.
While I understand that it was done for better consistency, it is less consistent in other ways (behaviour inside and outside a function): You cannot execute the same code in a function and in the REPL, making it a lot harder to test/debug stuff by copy/pasting code (from functions) in the REPL.
Also, you cannot simply wrap code made in the REPL/script into a function.

Although I must admit I do not really like the idea of creating more globals, making everything global outside of a function will solve the problem with less breakage than my preferred alternative (making everything local: each file is its on scope, globals must always be defined). I can actually get this already easily (except for the REPL) by always wrapping everything in functions.

For me personally, fixing the REPL would be enough (I would be using functions anywhere else anyway). But again, too many people just running the code they made/tested in the REPL as a simple script without wrapping it in a function will be unpleasantly surprised if it does not work as expected, so that should be covered as well, and you might as well do it globally then (more files, modules, etc. are more advanced users anyway).

I think the change must be made as soon as possible.
I would not expect this change to actually break much/any code: The few people that know/understand the current behaviour should not be running much code outside functions, and the newbee/coming from other languages crowd is more likely to be happy that the “bug” is fixed than talking. Treating at it as a (design) bug for which the fix may break backward compatibility for a very limited number of cases would make this process faster, easier and with less fuss …

5 Likes