New scope solution

proposal

#1

I have an alternate, non-breaking solution to the scope issue previously discussed on discourse and github. The new potential solution is the following rule to decide whether a variable, x, which is not marked as explicitly local or glogal and which is assigned to in a top-level scope construct is local or global:

x is global if it is accessed before being assigned on all paths through the scope body.

In other words, x is local unless there’s no way it could make sense for it to be local. Or, in compiler lingo, x is local unless every assignment to x is dominated by an access.

This solution fixes the various examples that people have complained about, since in all of these, the global variable is used as an accumulator and is read before being modified. For example:

t = 0
for i = 1:10
    t += i
end
# expect `t == 55` here

Since t += i means t = t + i and t is accessed before being updated, the t in the for loop body refers to the global t rather than a local t. Or, more simply put, t is global because otherwise there’s no way this code could not be an error.

Pros

  • Solves the problematic cases that people have complained about.
  • Non-breaking: any code whose behavior changes would previously have resulted in an error. We consider such a change to be “minor” in the sense discussed in this thread.
  • The default is still that variables in local scopes are local. This avoids accidentally littering global scope with variables that are only used within each iteration of a loop.
  • Top-level behavior roughly approximates behavior inside of functions. It’s not perfect, but it’s much better than what we have now and possibly better than what we had before (≤ 0.6).
  • Statically resolvable: the meaning of code does not depend on what global variables exist.

Con

  • The only downside it seems to have is that it’s a bit subtle and rather DWIMy.

Another possible solution to the global scope debacle
Another possible solution to the global scope debacle
#2

Brilliant!


#3

The most common idiom I can think of where global and local behaviors would differ is:

last_i = 0
for i in 1:10
    last_i = i 
    rand() > .5 && break
end
last_i

That’s not a horrible sacrifice. Are there others?


#4

Another case where an assignment is intended to target a “global” variable is this one:

julia> found = false
false

julia> for x in 1:5
            if x == 5
                found = true
                break
            end
        end

julia> found
false

#5

IIUC, these two examples would behave differently:

julia> found = false
false

julia> for x in 1:5
            if x == 5
                found = true
                break
            end
        end

julia> found
false
julia> found = false
false

julia> for x in 1:5
            if !found && x == 5
                found = true
                break
            end
        end

julia> found
true

If so, it seems like a rather subtle thing to keep track of.


#6

Another possible variation is that if every path is one of the following:

  • read-before-write (including read-only)
  • write-only
  • no use

then the variable is global. The motivation for allowing the “write-only” case is that making the variable local on a write-only path is useless—you might as well delete the code.

Yes; interestingly, it would be fixed by this variation, since one version of the code is write-only while the other version is read-before-write.


#8

K. To put it differently: a write-before-read will signal that a variable is meant to be local. You are implying that you don’t care what the value of the variable was before you hit that block. That does seem to cover the majority of use cases…

(Referring to the second variation)


#9

Yes, I believe that’s equivalent: if there is any read of a variable that is reachable from a write of the variable, then the variable is local; otherwise it it global.

Edit: now I’m not sure it’s equivalent. Have to think about it a bit more.


#10

This solution is an ugly, unprincipled and inelegant kludge. It is madness.

At the same time, this is pure genius.

I love this solution. Please make it so!

A slight problem is that this will make the static analysis of top-level scope part of the language spec. In other words, every improvement of static analysis that is used for this decision will commit us to never regress on that (within 1.0) and probably backport it to all supported 1.x versions.

So it is probably necessary to write an explicit function, with officially specified rules, that performs static analysis for the sake of scoping decisions only; and make sure that compiler improvements do not improve the name_resolution static analysis (otherwise, 1.0.7 code will break on 1.0.6). Do you know already at what point this will be decided? I am thinking that we would need to lower with maybe_global vars, then perform a very careful (i.e. reproducible, not precise) static analysis of all @goto appearing, and then decide on each maybe_global. Is lowering currently eliminating obviously dead branches (e.g. if false)? Sorry if these questions are a bit naive, I am insufficiently knowledgeable about the compile process.


#11

The variation will make for very interesting bugs, e.g:

i = 0
foundat = 0
while true
	i += 1
	if rand() < 0.1
		foundat = i
		#@show foundat
		break
	end
end

where a seemingly harmless @show foundat can change the behavior of the code. This particular example is not exactly useful since foundat and i will have the same value at the end, but it is to demonstrate my point.


#12

I think this is a very clever solution. In fact, it’s so clever that it makes me uncomfortable.

I feel that while this may reduce the number of scoping bugs caused by the behaviour change, it will make the ones remaining much more mysterious and difficult to reason about.


#13

For example? What sort of bugs do you envision?


#14

Its the bugs that I can’t envision that I’m really afraid of!

I find the example by @mohamed82008 compelling.


#15

What is the bug? The code does not run, so what do you think happens?


#16
t = 0
i = 0 ## <<< 
for i = 1:10
    t += i
end
# expect `t == 55` here
# expect `i == 10` here rather than UndefVarError: i not defined ??? 

Is i now global? (just curious, since global loop variables are not a good idea obviously)

sounds like an awesome solution to me

thanks


#17

It is the proposal that any write-before-read makes the variable a local scope variable, while write-only variables are global. The above example shows a trivial case, where an optional read can be added which is not supposed to make any semantic difference, yet with the proposal above, it will end up changing the variable from global to local. Any other case where write-before-read is used and it makes sense to have the variable global will share the same problem.


#18

I think I see what you mean. But isn’t that covered by Stefan’ s “write only” path?


#19

But if I add the @show statement, it won’t be write-only anymore, so it will be a local variable.


#20

Ahh, good point.


#21

It’s a genius idea but I think it’s a bit too smart and doesn’t cover all cases, does it? It’s also hard to explain how scoping work. Is it really worth doing this than educating users to write functions?

I would just be happy if the REPL and IJulia has the soft global scope hack, and that it can be turned off for more advanced users.

I know there is no shortage of opinions. Just my 2 cents.