New scope solution

I think rules should be simple and easy, this one is not! If you really feel that you need this rule, remove it for Julia 2.0 or before if possible.

7 Likes

It is quite a mess in current behavior too. Look at Schroedinger’s cat:

julia> dead = false
       for j in 1:1
           if rand()>0.5 dead = true;end  # cat is unlucky :(
           print("is shroedinger's cat dead? $dead")
       end
is shroedinger's cat dead? true

If we avoid dead cat is happy:

julia> dead = false
       for j in 1:1
           # if rand()>0.5 dead = true;end
           print("is shroedinger's cat dead? $dead")  # cat is lucky! :) 
       end
is shroedinger's cat dead? false

But in case of cat’s luckiness experiment is broken:

julia> dead = false
       for j in 1:1
           if rand()>0.5 dead = true;end   
           print("is shroedinger's cat dead? $dead")  # coder is not lucky :( 
       end
ERROR: UndefVarError: dead not defined

It seems that assignement (which not happened!) made dead variable local and undefined.

EDIT:
Could this be optimized out in future?

dead = false
for j in 1:1
  if VERSION<v"1.0" dead = true;end  # I want to check conditional programming here
  print("is shroedinger's cat dead? $dead")
end
ERROR: UndefVarError: dead not defined
1 Like

I don’t think many people want to leave things as they are.

Both in SoftGlobalScope and in a function (any local scope) your example works as expected. It would also work if everything would default to global. So this is most likely going to be fixed, (almost) independent of what change will be made.

How it could be if one proposal want to check context and context is quite questionable as could be seen from my tests too?

What does really mean that variable is not used? Or is used “write only”?

Maybe I am wrong, could you explain it more please?

Error messages: https://discourse.julialang.org/t/improving-error-messages-for-the-scoping-problem/16209.

Remember, the local/global decision for variables does not happen at runtime, it happens at compile-time (I think early in lowering?). I think Stefan’s solution is to follow unconditional @goto and always follow both branches (even for literal if false). So it would fix the following example:

julia> dead=true;
julia> let 
           @show dead
           @goto skip
           dead = false
           @label skip
       end
ERROR: UndefVarError: dead not defined

but not

julia> dead=true;
julia> let 
           @show dead
           if true @goto skip end
           dead = false
           @label skip
       end
ERROR: UndefVarError: dead not defined

The rule would be: Follow all pathes (without evaluating known conditionals). If there exists a write before read path, then the variable defaults to local. Otherwise, it defaults to global.

As a side note: The while gets evaluated in the outer scope, not the inner scope. That is probably confusing for some people as well:

julia> m=4; n=2; i=1; while i>0
       i = n
       @show i, n
       global m -= 1
       global n -= 1
       @show m,n
       m>0 || break
       end; @show m, n, i
(i, n) = (2, 2)
(m, n) = (3, 1)
(i, n) = (1, 1)
(m, n) = (2, 0)
(i, n) = (0, 0)
(m, n) = (1, -1)
(i, n) = (-1, -1)
(m, n) = (0, -2)
(m, n, i) = (0, -2, 1)

So, regardless of this scoping, a minimally invasive (very non-optimizing) @code_semilowered that produces valid julia source code with only let blocks and @goto would be nice for that. It would also teach people about the iterator interface.

1 Like

This proposed solution bears a striking resemblance to escape analysis, which is a tricky beast but also key to some seriously powerful compiler optimizations. It is also, notoriously, the one optimization that Java can still not perform (well). I bring this up because I think it is worth considering the fix to this “bug” in the larger context of escape analysis.

Java has problems with escape analysis not only because it is a difficult optimization to perform, but also because the language was not designed with it in mind. With Julia, we have the opportunity to evolve the language in a way that would facilitate escape analysis.

I think the crux of the scope “bug” is the desire to create strongly bounded scopes. We want this because it simplifies escape analysis. If we state that any variable created within a for loop, or within a function, falls out of scope at the conclusion of the loop or function body, unless returned, then we only need follow the path of explicit returns to perform escape analysis. However, if some value within one of these scopes is assigned to a global variable then we must consider multiple escape routes. Consider, for example:

b = []
function foo()
  global b
  for i = 1:10  
    append!(b, i)
  end
end
foo()

There are, in this function, 10 values that have escaped the function scope. Still, because we must specify global b, analysis is relatively straightforward. The more complicated the rules become for determining when a variable might escape a scope, the more difficult it becomes to perform escape analysis.

The REPL throws a monkey-wrench into all of this, as it is essentially a never-ending function call. Nothing can escape the REPL, so we would like to relax some of the constraints around escape analysis in the name of “user experience”. The problem, of course, is that the REPL is not a function call.


In short, I am not in favor of this proposed solution because of how it potentially complicates escape analysis. I do think, however, that it highlights one potential path toward a more general solution. What if, instead of tweaking the rules for how variables might, or might not, escape from an inner scope, we allowed for outer scopes to explicitly opt out of variable escaping? In other words, what if you could do the following:

module Foo
  locally_scoped() # => this call alters the scoping rules of the module
  b = []

  function bar()
    for i = 1:10 
      append!(b, i)
    end
  end

  function baz()
    @show b
  end
end

Foo.bar();
Foo.baz() # => 10-element Array{Any,1}: 1, 2, ...

This way, the REPL could evaluate in a module context wherein every variable is considered locally scoped, but we can still preserve the ability to perform escape analysis (in every other module).

3 Likes

Thanks for reaction! :slight_smile:

You are more experienced, could you tell me if there is way to make conditional compilation similar to C++'s #ifdef?

Could be @assert optimized out in future version if there are so subtle implication to variable scope?

Is it true? I am really confused as well! :stuck_out_tongue:

Maybe there can be a balance. The “if read before write then the user refers to the global variable” is probably safe enough (if you were writing that, maybe while debugging code, you’d be getting an error so you are really not losing much). In case this could still cause confusion, I imagine there is always the option to allow this but throw a warning (Read before write variable in a scoped block defaults to global: to avoid this warning add the keyword global). The new user can decide to ignore the warning (or learn from it) and the advanced user can copy paste the for loop from function body to REPL anyway as in this scenario the warning doesn’t matter so much (and add global in production code). The warning also has the advantage that the user will suspect that fancier tricks, like:

myvar = 0
for i = 1:10 
  myvar = i
  i == 5 && break
end

may require the keyword global to work as intended.

OTOH the “if we write on the variable but never read, then it is local” is IMO a bit extreme and here I completely agree that it risks getting too confusing (some @show statements during debugging could cause things to flip). I’m also afraid that this change is technically breaking. That is to say, if some users wrote:

myvar = 0
for i = 1:10 
    myvar = i
end
@assert myvar == 0

His / her code would break. I imagine nobody would write something like this on purpose, but I wonder whether semver allows this kind of changes in a minor release.

we showed above that scope definition of variables is decided in compile time before calling (it could be different in REPL though) doesn’t apply it here?

Yeah, I just sketched this up quickly, and you’re right that it would likely have to be some sort of new keyword or compiler directive. Maybe:

locally_scoped module Foo
# ...
end

But the idea is that, semantically, this would be the same as magical macro that appended global before every variable definition.

This is safe in static code. But it will be unsafe to add simple “read” line into code under this “solution”.

But I suppose you know:

Agree totally, and IMO this kind of solution makes it worse in pedagocical contexts. I know teaching isn’t the only consideration here, bit to the extent that it’s a concern at all, this would make my life in the classroom much harder I think.

9 Likes

Deleting or adding an assignment can always affect the meaning of a variable; that’s not restricted to top-level scopes in any way, and this wouldn’t change that. The fact that code that hasn’t executed has a completely predictable static effect is a good thing, not a bad thing. Deleting or adding accesses to variables having an effect on their meaning is a different story and is indeed worth pause and consideration.

3 Likes

For precisely this reason, I think the proper way of doing @assert in optimized-out fashion is to wrap the statement in an if true or if false block, depending on whether you want the asserts in. That way, the optimizer will eliminate the dead code after local / global decisions have been made.

Regarding conditional compilation, I rather like the fact that macros in julia operate on AST, and not on a textual representation / on an entirely different plane with a different parser (you basically need to learn two languages: C and preprocessor).

Is your proposed algorithm how to decide about the variable being global or local based on the entire path through the scope?

What if it was based on the “first hit”?

Go through the scope, and the first time you meet this variable, you decide whether it is global or local based on these rules. Wouldn’t that address @mohamed82008’s example where the @show below the first encounter with foundat changes the local-global decision?

i = 0
foundat = 0
while true
	i += 1
	if rand() < 0.1
		foundat = i
		#@show foundat
		break
	end
end
1 Like

Go through the scope, and the first time you meet this variable, you decide whether it is global or local based on these rules.

What do you mean by “first time”? The code can be an arbitrary mess of @goto. If there exists a “reachable path” with assign before read, then SemVer demands that the variable is local.

Luckily this can be a simple digraph problem (no need to iterate over all pathes; linear time in the number of basic blocks). The only somewhat unclear question is what “reachable” should mean here: Pathes that are reachable at runtime definitely need to be reachable for this static analysis question. It is undecidable which pathes are reachable at runtime (this is the halting problem).

When faced with impossible problems, one always needs make a trade-off for whatever partial solution one produces: Transparency and speed (consider many pathes as possible) vs exactness: make a best effort at pruning impossible pathes.

“Best effort” ranges from the obvious (literal if false is not taken) over elaborate induction proofs by llvm, up to automated theorem proving. Already llvm can figure out surprising stuff: consider e.g. f(n)= (s=0; for i=1:n s+=i*(i-1)*(i-2); end; s) and marvel that f(1<<50) does not freeze your computer.

If Stefan’s strike of genius and madness is adopted, then the theorem prover used for this decision will become part of the language specification. So we better keep it simple (I’d propose: no pruning at all; always follow both branches, even for literal if false).

1 Like

What I meant was for all paths, record the first occurrence of the variable (read, write). If all are “write”, it is write-only, and so on for “read-before-write” (if any of the first occurrences was a read), and no-use.

So don’t look at all the occurrences of a variable along all paths, only the first occurrences.

That is a nice simple criterion. Another thought is to have two simple criteria: “clearly local” and “clearly global” and require explicit annotation otherwise.

3 Likes

Even better!

While local write-only is clearly nonsensical (dead code), performing a global write would be pretty breaking. And script messes with local write-only vars do happen in real life :frowning: