New scope solution

Agree totally, and IMO this kind of solution makes it worse in pedagocical contexts. I know teaching isn’t the only consideration here, bit to the extent that it’s a concern at all, this would make my life in the classroom much harder I think.

9 Likes

Deleting or adding an assignment can always affect the meaning of a variable; that’s not restricted to top-level scopes in any way, and this wouldn’t change that. The fact that code that hasn’t executed has a completely predictable static effect is a good thing, not a bad thing. Deleting or adding accesses to variables having an effect on their meaning is a different story and is indeed worth pause and consideration.

3 Likes

For precisely this reason, I think the proper way of doing @assert in optimized-out fashion is to wrap the statement in an if true or if false block, depending on whether you want the asserts in. That way, the optimizer will eliminate the dead code after local / global decisions have been made.

Regarding conditional compilation, I rather like the fact that macros in julia operate on AST, and not on a textual representation / on an entirely different plane with a different parser (you basically need to learn two languages: C and preprocessor).

Is your proposed algorithm how to decide about the variable being global or local based on the entire path through the scope?

What if it was based on the “first hit”?

Go through the scope, and the first time you meet this variable, you decide whether it is global or local based on these rules. Wouldn’t that address @mohamed82008’s example where the @show below the first encounter with foundat changes the local-global decision?

i = 0
foundat = 0
while true
	i += 1
	if rand() < 0.1
		foundat = i
		#@show foundat
		break
	end
end
1 Like

Go through the scope, and the first time you meet this variable, you decide whether it is global or local based on these rules.

What do you mean by “first time”? The code can be an arbitrary mess of @goto. If there exists a “reachable path” with assign before read, then SemVer demands that the variable is local.

Luckily this can be a simple digraph problem (no need to iterate over all pathes; linear time in the number of basic blocks). The only somewhat unclear question is what “reachable” should mean here: Pathes that are reachable at runtime definitely need to be reachable for this static analysis question. It is undecidable which pathes are reachable at runtime (this is the halting problem).

When faced with impossible problems, one always needs make a trade-off for whatever partial solution one produces: Transparency and speed (consider many pathes as possible) vs exactness: make a best effort at pruning impossible pathes.

“Best effort” ranges from the obvious (literal if false is not taken) over elaborate induction proofs by llvm, up to automated theorem proving. Already llvm can figure out surprising stuff: consider e.g. f(n)= (s=0; for i=1:n s+=i*(i-1)*(i-2); end; s) and marvel that f(1<<50) does not freeze your computer.

If Stefan’s strike of genius and madness is adopted, then the theorem prover used for this decision will become part of the language specification. So we better keep it simple (I’d propose: no pruning at all; always follow both branches, even for literal if false).

1 Like

What I meant was for all paths, record the first occurrence of the variable (read, write). If all are “write”, it is write-only, and so on for “read-before-write” (if any of the first occurrences was a read), and no-use.

So don’t look at all the occurrences of a variable along all paths, only the first occurrences.

That is a nice simple criterion. Another thought is to have two simple criteria: “clearly local” and “clearly global” and require explicit annotation otherwise.

3 Likes

Even better!

While local write-only is clearly nonsensical (dead code), performing a global write would be pretty breaking. And script messes with local write-only vars do happen in real life :frowning:

I’m coming late to the wider discussion around the scope changes, so please forgive me if I haven’t caught-up with all of the various viewpoints. Having tinkered a bit in the v0.3-0.4 era, the scope changes in v1.0 came as a little bit of a surprise.

So far I’ve seen no mention of parallelism in the discussion, and it seems to me that the concept of “local” —if it’s to be truly useful in the context of parallel execution— requires further refinement and qualification, depending on the form of parallelism and the locality required.

Such added complexity might only be required by heavy-duty applications, whereas the v1.0 approach of having a simple local scope as the default might ultimately prove unhelpful in many situations, while adding a syntactic burden for interactive exploration/prototyping that others have described.

Although global scope can carry severe penalties and pitfalls, explicit enforcement of locality (e.g., using directives/annotations, or simply functions) when necessary, and with specific purpose, seems to me to be a better approach.

That said, I think Stefan Karpinski’s solution is ingenious. It’s similar to the “autoscoping” behaviour that Sun (now Oracle) added to its OpenMP compilers, albeit that automatic behaviour has to be enabled explicitly.

The autoscoping rules for variables in an OpenMP parallel region can be found in section 5.3 of the Studio Compiler manual; the rules for scalars are particularly relevant here:

S1: If the use of the variable in the parallel region is free of data race conditions for the threads in the team executing the region, then the variable is scoped SHARED.

S2: If in each thread executing the parallel region, the variable is always written before being read by the same thread, then the variable is scoped PRIVATE. The variable is scoped as LASTPRIVATE if it can be scoped PRIVATE and is read before it is written after the parallel region, and the construct is either a PARALLEL DO or a PARALLEL SECTIONS.

S3: If the variable is used in a reduction operation that can be recognized by the compiler, then the variable is scoped REDUCTION with that particular operation type.

Perhaps some consideration of this and related approaches —keeping parallelism in mind— would help convergence to a solution.

6 Likes

@PetrKryslUCSD, based on your suggestion and the many excellent problematic examples in this thread—thank you, everyone who posted of nasty corner cases—@jeff.bezanson and I are currently leaning towards the following rules:

  • if the first use of x on every path is a read, then x is global
  • if the first use of x on every path is a write and there’s a read somewhere, then x is local
  • otherwise x must have an explicit local or global annotation

This is a bit breaking since some code that previously worked will now error because it needs an explicit local or global annotation, but since we were willing to make a much more breaking change previously, that isn’t the end of the world. Although we would, of course, have to assess the extent of the change.

This rule is both less clever and less brittle than the original idea in the first post of this thread. Most of the brittleness and possibly confusing cases in the thread seems to be much better under these criteria. The common accumulator examples are read-first, so they make the accumulator global by default and therefore work. Cases where someone writes a variable and uses it later in the local scope continue to work as they currently do, defaulting to local, thereby avoiding leaking these variables as globals. Examples where a global is used in a write-only fashion inside of a loop require an explicit global annotation in order not to error, which makes them less brittle in case you insert a debug read. We could make an exception for the write-only case and default to global, but as has been pointed out, that’s quite dangerous since putting an innocuous-seeming debug statement then changes the behavior. It’s also quite easy to delete the last use of a local but forget to remove the initialization, which would, under a “write-only is global” rule make it accidentally global—oops! On the other hand, this proposal would give a clear error message instead, prompting you to explicitly annotate the variable as global or local—or delete the unused assignment.

Further hole-poking and counter-examples are welcomed!

26 Likes

Can you confirm that “every path” will take into account dead pathes? So the following will error ? (rule 3; rule 1 does not apply, because no dead code elimination is performed)

let 
false && x=1
@show x
end

While, on the other hand, the following will be global (rule 1; we don’t resolve branches, but we do follow unconditional jumps):

let 
@goto skip
x=1
@label skip
@show x
x=2
end
1 Like

Yes, it has to be a purely syntactic criterion. If you see a conditional, you consider all branches; if you see a loop you consider it being executed and not executed (for these purposes, a loop is equivalent to an if without an else). If there’s any @goto we should probably require explicit annotation of anything that is assigned.

3 Likes

Forgive me if I’m missing something, but couldn’t we perhaps have an even simpler heuristic: variables in non-function scopes are global when they exist outside of a function scope?

In other words: are there examples of for or let blocks evaluated in a module scope where we definitely would want the variables in those blocks to be local? Conversely, is there a case when they exist within a function body where we would want variables to (automatically) be considered global?

Since top-level REPL evaluation occurs in a module scope, this should (if I’m understanding everything correctly) bring us back to pre-v0.7 behavior for for loops. While this would technically be a breaking change, I suspect it wouldn’t impact very much actual application or package code (again, assuming that for or let blocks are rarely found outside of a function body in packages or applications).

You could also apply this (or @jeff.bezanson’s and @StefanKarpinski’s) heuristic if for is “declared” as global for so we:

  1. don’t need to make breaking change!
  2. will have better possibility to code with “explicit is better than implicit” paradigm with for without global keyword.
  3. have quite simple method (i.e. “add global keyword”) to make teaching easier.

Alternatively we could do syntactic “hack” (*) similar to @goto and @label:

@auto_global for 

this give us possibility to have other heuristics in the future.

In REPL @autoglobal on and @autoglobal off could change heuristics globally (although here we have to consider danger of creating dialects).

(*) - question what language construct @goto and @label is I could probably ask in future in another topic. :wink:

What do you mean by that? Do you mean every variable declared in a global for for is automatically (and implicitly) declared global?

I meant that without global it is stay as it is. With global it apply heuristic. Which one is open question (although I slightly prefer @jballanc’s as simpler).

I’m not a fan of what I just proposed, it was just what I understood. I don’t think global for with implicit globals is a good idea because it will leak variable declarations into the containing scope.

Forgive me if I’m wrong, but I thought for and while are syntactic sugar for let and conditional @goto (plus some iterate calls in the case of for); and if, else, etc are syntactic sugar for conditional @goto.

For that reason, the rule needs to work post-lowering and be clear on this lower level, as well as imply a sensible high-level rule (at the very least, the rule should be invariant under “partial lowering”, i.e. replacing high-level constructs by their low-level targets). Since the rule should be invariant under rearrangement of basic blocks, the second example follows. Or did I get this wrong?

In principle, your rule is on the control flow graph, not AST (that’s why it is more complicated, and also why it should be accompanied with tools to view the CFG; current tools to view the AST are not helpful anymore to debug scoping issues). So it is not syntactic, but close enough not to matter outside of corner cases.

PS.

question what language construct @goto and @label is I could probably ask in future in another topic. :wink:

Goto does what’s written on the label :wink: Sorry for the pun. A label labels a position in your code (so it does nothing). Goto goes (jumps) to a position; you need to label the position in order to tell goto where to jump to. This is super useful for e.g. breaking out of nested loops, and imho far cleaner and more readable than pulling along a bool (typical found) and doing multiple found && break and a final if found ... else end. Also, it is the low-level construct that loops are syntactic sugar for.

Can you please clarify for which constructs and contexts this would apply?

Eg

  1. for, while, try-catch-finally, let, comprehensions, broadcast-fusing,
  2. only for the REPL? or all module/baremodule?