Also, IMO this alone would be worth making a minor release for.
Fair enough. That is easy to rectify.
A different view on my previous scope issue resolution proposal.
First, the necessary background. The major differences between a script, a program, and a software system:
Script is typically designed, written, and maintained by a single person. It is often written in a programming language that doesn’t have a one-to-one mapping from syntax to underlying semantics expressed in computer science terms: the mapping is usually bundled and blurred, to make the language more “natural” and concise. Script usually has a single context. Script author often expects a variable with a given name to have the same identity throughout the script. In linguistic terms, it is a mono-context conversational instruction given by a person to a machine.
Program is usually designed at a high level by one person. It can be developed by multiple people. It is typically maintained by multiple people. The people involved almost always have some training in Computer Science. It usually has multiple contexts, delineated by functional, type, and other boundaries. In linguistic terms, it is a multi-context formal written instruction created by a group of people speaking same language.
Software System is often designed at a high level by multiple people, working at different times, using different conceptual frameworks. The high level designers, usually called Architects, tend to have advanced training in Computer Science and Software Engineering. Software System is almost universally developed and maintained by multiple people. It typically has multiple mutually incoherent contexts: semantically same things can be named differently in different contexts, and semantically different things can be named the same. In linguistic terms, it is a multi-context formal written instruction created by a group of people speaking multiple languages.
Julia 1.0 violates expectations of Script writers, forcing them to conceptually ascend not only to the level of Program writing, but further to the level of System writing. In linguistic terms, analog of the infamous Julia 1.0 redefinition of a variable inside a loop would be:
“John developed into a strong young man by age 16”.
“Ever since, John has played as a school varsity quarterback”.
Naturally, it trips the Script writers when they find out that Julia 1.0 treats the word “John” in the two statements above as references to two different people. They expect that this conversational context only contains one person named “John”. Overcoming this innate expectation requires a conceptual leap to the System thinking, where it is not unusual to have two semantically different things be called the same name. I think that such conceptual ascendance is too much to ask of Script writers.
In natural language, there are markers indicating that the context of conversation has changed. For instance:
“There is another strong boy named John, who has played as a school varsity quarterback”.
In programing languages, the role of such marker is played by an explicit variable declaration. My proposal is to introduce this to Julia, in additive way conducive to gradual migration, yet in a very very concise form, to keep Julia friendly to script writers.
@Balance: Your proposal seems like a fairly fundamental shift in language semantics. It reminds me of the discussion here about introducing new language constructs for immutability. (It also raises the idea of using
:=). It might be worth reading through that to see what some of the designers of Julia have to say about that.
I am mostly a spectator, but I suspect such a change is unlikely to be considered by the core devs, in part because (IIUC) it would break so much existing code.
It occurs to me that one solution to the issue of “can we make scope simple for scripting without destroying packages like Rebugger?” is to use a function for the body of what’s currently the
let body: that is, instead of
let x, y = X, Y body end
let x, y = X, Y function _letbody287(x, y) # really a gensym body end _letbody287(x, y) end
We could even create a package
LocalLet and have this be
@llet begin x, y = X, Y body end
Julia already distinguishes assignment syntactically from both mutation and equality. There are the following assignment-like syntaxes:
x = ...is assignment. The name
xis bound to the value that the
...evaluates to. Local bindings never leak out of their scope. It doesn’t matter what
xwas bound to before this happens, that value is not affected in any way. No object is mutated by this and no other binding besides
x.f = ...is equivalent to
setproperty!(x, :f, ...)which, by default mutates the object
xby changing its field
fto the value of the expression
xis visible in another scope or by another bindings, this change will be seen everywhere. No bindings are changed by this.
x[i] = ...is equivalent to
setindex!(x, ..., i)which, for arrays mutates the array
xby changing its
ith slot to refer to the value of the expression
xis visible in another scope or by another bindings, this change will be seen everywhere. No bindings are changed by this.
There are equality operators
===which check for value-based equality and identity-based equality, respectively. There is also already a syntax for creating a constant binding as opposed to a variable binding:
const x = ...
Indeed, this is very clear in relation to mutability/immutability, and in my experience this was practically sufficient for Julia code I’ve written so far. A good balance between expressiveness and conciseness IMHO. I dig @StefanKarpinski’s stance on this.
However, Julia 1.0 broke my code in more subtle ways than I anticipated. Upon deeper analysis, I realized that the issues appear to stem from the Julia’s dissymmetry between assignment and mutation.
Let’s get back to @StefanKarpinski’s examples:
When I see x.f = … or x[i] = … equivalent in pretty much every statically compiled language, including Julia, it tells me that binding between name x and a value of a mutable type is expected to already exist in an accessible scope before the code analysis gets to this line. A concrete binding is found by the name resolution algorithm, potentially including a binding in the current scope. If such binding isn’t found, compiler reports an error.
When I see x = … equivalent in a statically compiled language, the semantics, potentially, could be one of:
A. Symmetrical to the treatment of x.f = … or x[i] = …: a binding between x and some value is already expected to exist in an accessible scope, in which case the statement is interpreted as re-assignment of a new value to that binding. If a binding doesn’t yet exist, compiler reports an error. Creation of a binding is supposed to be explicit, happens in the current scope, and can be very syntactically similar to the assignment. This is how compilers for most statically compiled languages tuned for large-scale programming operate.
B. Same as above, except if binding doesn’t yet exist in an accessible scope, then the statement creates a new binding in the current scope. As I understand, this is how Julia behaved prior to 1.0, at least in regard to a current scope accessible inside a for loop.
C. Binding between x and some value may already exist in the current scope, in which case the statement re-assigns a value to that binding. If it does not yet exist in the current scope, then the statement creates it. This is how Julia 1.0 behaves in regard to a current scope accessible inside a for loop.
This discussion thread exists because significant number of people are perplexed by the defaulting to the choice C for the x = …. I also understand why defaulting to B wasn’t 100% desirable either.
Doesn’t it leave A as the logical default choice for the next stage of Julia’s evolution? Naturally, any move in this direction ought to be backward-compatible.
For what it’s worth, one of the more confusing things is that the name resolution is within the scope and a syntactic, not semantic thing. By this, I mean:
julia> x=1; julia> function f() @show x false && (x=1) nothing end julia> function g() @show x #false && (x=1) nothing end julia> f() ERROR: UndefVarError: x not defined julia> g() x = 1
In principle, I would prefer a splitting into
UndefGlobalVarError (no local binding of that name exists, and at runtime there is no global of that name) and
UnInitializedVarError (a binding for this symbol exists in the scope, but has not yet been assigned at time of use – all control flows leading to a use before assignment must give us this error, but this may need to be tracked at runtime due to
@goto, if the halting problem for the specific user code is too hard for the compiler).
FWIW, I really like @jeff.bezanson’s proposal, along with introducing
local for and
global for (and
let), which defaults to
local for now, with the option of changing the default later (whether in 1.3 or 2.0) after some experience has been gained, as advocated by @Liso and @oxinabox. Apart from being quite clean, this strategy should also alleviate @tim.holy’s concern. Importantly, code will continue to work the same in scripts and at the REPL; changing that seems worse than the original issue to me.
For teaching, any discussion about scope can thus be postponed by just introducing the
global for concept at first; perhaps this could be simplified further by introducing
for! as syntactic sugar as proposed earlier (though that begs the question of what happens to naked
for if/when the default is changed down the road) In addition, perhaps a warning could be issued whenever a local variable is introduced that shadows a global, in addition to a more helpful error message as proposed in #29585, silenceable by macro or environment variable. That might take much of the sting out of Example 1 of the original issue. For Example 2, it would technically even be non-breaking (as it currently errors) if
total_lines were initialized to the value of the global in addition to the warning, making the example work as intended.
With these measures, we might even find that the flood of discourse and stackoverflow questions subsides to a point where changing the default may not even be warranted anymore, or could at least be postponed until 2.0, hence honoring semver.
Thinking a bit more about this,
for! i = ... could additionally be made to imply
for outer i, and this could be how it’s explained in a teaching context, hence avoiding any reference to scope: just say it means that “variables can get overwritten by the loop”. If/when the default is changed to
global for later,
for! i will simply continue to imply
outer. This might be simple enough for non-CS students to grasp. @stevengj?
I am not sure why one would jump through hoops just to avoid mentioning scope (briefly, of course).
Maybe I was lucky to teach some exceptionally bright students, but usually even students without prior programming experience (who were nevertheless interested in programming, which is why they took the course) grokked scope rather quickly. After all, it is a very intuitive concept, and someone taking courses at or above a BA/BSc level in STEM or data-based social sciences must encounter abstractions that are much more difficult. Teaching even basic programming without scope is like teaching linear algebra without bases.
I rarely saw
for loops that accumulate in global scope in the wild in Julia, even for v0.6, because it is bad style.
If it is not something we would do because it is not good practice, why would we teach it? Isn’t this a disservice to the students?
PS: Sorry @s-broda, of course not all of this is in reply to your comments. I just find the “teaching” argument not super-convincing.
It’s perfectly good style for interactive exploration, where you usually don’t write functions. Not all coding is about performance or reusability.
As far as I am concerned, the idea is not to avoid introducing scope, but to postpone the discussion to a more appropriate time than literally the first 20 minutes of the first lecture (which may not even be on programming; in my case it’s finance). It’s easy to forget how hard these concepts are for students from non-STEM backgrounds.
In any case, just from the number of people running into this issue, it is clear that there is an issue here, presumably not just for students but also people who may not enjoy the luxury of a carefully designed course. So I think it’s best to keep the thread focused on the merits of the proposed solutions.
I am not so sure about this, since we don’t know the base (to be fair, the this can go both ways, since people could encounter the issue and not show up here or other forums).
Given the difficulties with the various solutions that are apparent in this thread, I think we should keep “not changing semantics, but giving helpful error messages” on the table as one of the solutions, at least in the short run. Strictly speaking of course you are right, since anything not discussing this particular solution is off-topic in this thread, but I feel that has happened already.
If you read my 2 posts above, my thinking was precisely that if the proposals I summarized are implemented, then we may get around having to change the semantics, by implementing appropriate warnings and errors and introducing useful syntactic sugar, viz.,
for! i being sugar for
global for outer i. All these can be done now (in 1.0.2) At least we might buy ourselves some time to get more data; the default can always be changed from
local for to
global for later if still deemed necessary, and in a semver compliant way.
I have taught absolute beginners (other languages) and is indeed not really that difficult to introduce scope to them in a way they understand IF you can do it at the appropriate time (i.e. when introducing functions), and if it is a the version of scope as implemented in any other language:
You generally start with operators, variables and loops. Assigning something to a variable and then doing something with it in a loop is an extremely common pattern (I am convinced, even without checking, that the v1 change invalidated code in the majority of pre v1 tutorials). Having to explain that they have to use global to access the variable right in front of loop (for no apparent reason) is going to be difficult. The easy place to explain scope is functions: you have a separate piece of code that you are calling from another location, so there it is logical that the variables are contained to that part. You are also giving the values from your current location/scope needed by the function as parameters. One of the worst things of v1 scope then is that at this point you are going to have to explain that what they learned in the beginning (putting globals to get at the variable in front of the loop) does not apply if you put it in a function.
Some of the other suggestions here are even more complex and difficult to explain than v1. In my experience, if I start needing a lot of exceptions and lengthy explanations, I am usually on the wrong track (KISS principle). The original proposal by @jeff.bezanson is simple and consistent, and poses a very limited amount of incompatiblity.
Adding my 2c to the teaching discussion, particularly fresh since I’m just wrapping up teaching an intro computing course for biologists in python.
My experience is that students, esp in an intro course, will take a lot on faith. All the syntax they’re learning is new, all the concepts are new. If you tell them that
for is spelled
global for, they’ll do it until you tell them to do otherwise. No one asked for example why it’s
For the particularly curious students you can say, “great question - it’s not critical that you understand right now, but Google ‘scope’ if you’re interested in learning more, or stop by office hours.”
I don’t have a strong opinion on the underlying merits of the different ideas. I got bit by this early in the transition, but adapted quickly. Given how great the language design is generally, and how much time the core devs think and reason about the decisions0 (a fact which has become even more glaringly obvious in the last 6 weeks teaching python, which I used to think was fine), my inclination would be to trust the reasons for the design as implemented. More informative error messages are always great of course.
Late to the party here. I just want to add some importance to the quote above. Packages are not the only things imported, datasets too. Big data analysts need to import data of 10G or even 100G into the memory; it will be a nightmare for them to keep restarting the program.
This hasn’t changed and has always behaved like this;
do syntax is just shorthand for passing an anonymous function to a higher order function. (Your code also doesn’t make much sense as far as I can tell—which
m is supposed to persist outside?)
Sorry I had to check what is OP saying! I will delete this message too to not spoil this topic…
No problem! I wouldn’t worry about spoiling the thread or deleting anything, I think we’ve gotten the feedback we need at this point.