New scope solution

What I meant was for all paths, record the first occurrence of the variable (read, write). If all are “write”, it is write-only, and so on for “read-before-write” (if any of the first occurrences was a read), and no-use.

So don’t look at all the occurrences of a variable along all paths, only the first occurrences.

That is a nice simple criterion. Another thought is to have two simple criteria: “clearly local” and “clearly global” and require explicit annotation otherwise.

3 Likes

Even better!

While local write-only is clearly nonsensical (dead code), performing a global write would be pretty breaking. And script messes with local write-only vars do happen in real life :frowning:

I’m coming late to the wider discussion around the scope changes, so please forgive me if I haven’t caught-up with all of the various viewpoints. Having tinkered a bit in the v0.3-0.4 era, the scope changes in v1.0 came as a little bit of a surprise.

So far I’ve seen no mention of parallelism in the discussion, and it seems to me that the concept of “local” —if it’s to be truly useful in the context of parallel execution— requires further refinement and qualification, depending on the form of parallelism and the locality required.

Such added complexity might only be required by heavy-duty applications, whereas the v1.0 approach of having a simple local scope as the default might ultimately prove unhelpful in many situations, while adding a syntactic burden for interactive exploration/prototyping that others have described.

Although global scope can carry severe penalties and pitfalls, explicit enforcement of locality (e.g., using directives/annotations, or simply functions) when necessary, and with specific purpose, seems to me to be a better approach.

That said, I think Stefan Karpinski’s solution is ingenious. It’s similar to the “autoscoping” behaviour that Sun (now Oracle) added to its OpenMP compilers, albeit that automatic behaviour has to be enabled explicitly.

The autoscoping rules for variables in an OpenMP parallel region can be found in section 5.3 of the Studio Compiler manual; the rules for scalars are particularly relevant here:

S1: If the use of the variable in the parallel region is free of data race conditions for the threads in the team executing the region, then the variable is scoped SHARED.

S2: If in each thread executing the parallel region, the variable is always written before being read by the same thread, then the variable is scoped PRIVATE. The variable is scoped as LASTPRIVATE if it can be scoped PRIVATE and is read before it is written after the parallel region, and the construct is either a PARALLEL DO or a PARALLEL SECTIONS.

S3: If the variable is used in a reduction operation that can be recognized by the compiler, then the variable is scoped REDUCTION with that particular operation type.

Perhaps some consideration of this and related approaches —keeping parallelism in mind— would help convergence to a solution.

6 Likes

@PetrKryslUCSD, based on your suggestion and the many excellent problematic examples in this thread—thank you, everyone who posted of nasty corner cases—@jeff.bezanson and I are currently leaning towards the following rules:

  • if the first use of x on every path is a read, then x is global
  • if the first use of x on every path is a write and there’s a read somewhere, then x is local
  • otherwise x must have an explicit local or global annotation

This is a bit breaking since some code that previously worked will now error because it needs an explicit local or global annotation, but since we were willing to make a much more breaking change previously, that isn’t the end of the world. Although we would, of course, have to assess the extent of the change.

This rule is both less clever and less brittle than the original idea in the first post of this thread. Most of the brittleness and possibly confusing cases in the thread seems to be much better under these criteria. The common accumulator examples are read-first, so they make the accumulator global by default and therefore work. Cases where someone writes a variable and uses it later in the local scope continue to work as they currently do, defaulting to local, thereby avoiding leaking these variables as globals. Examples where a global is used in a write-only fashion inside of a loop require an explicit global annotation in order not to error, which makes them less brittle in case you insert a debug read. We could make an exception for the write-only case and default to global, but as has been pointed out, that’s quite dangerous since putting an innocuous-seeming debug statement then changes the behavior. It’s also quite easy to delete the last use of a local but forget to remove the initialization, which would, under a “write-only is global” rule make it accidentally global—oops! On the other hand, this proposal would give a clear error message instead, prompting you to explicitly annotate the variable as global or local—or delete the unused assignment.

Further hole-poking and counter-examples are welcomed!

26 Likes

Can you confirm that “every path” will take into account dead pathes? So the following will error ? (rule 3; rule 1 does not apply, because no dead code elimination is performed)

let 
false && x=1
@show x
end

While, on the other hand, the following will be global (rule 1; we don’t resolve branches, but we do follow unconditional jumps):

let 
@goto skip
x=1
@label skip
@show x
x=2
end
1 Like

Yes, it has to be a purely syntactic criterion. If you see a conditional, you consider all branches; if you see a loop you consider it being executed and not executed (for these purposes, a loop is equivalent to an if without an else). If there’s any @goto we should probably require explicit annotation of anything that is assigned.

3 Likes

Forgive me if I’m missing something, but couldn’t we perhaps have an even simpler heuristic: variables in non-function scopes are global when they exist outside of a function scope?

In other words: are there examples of for or let blocks evaluated in a module scope where we definitely would want the variables in those blocks to be local? Conversely, is there a case when they exist within a function body where we would want variables to (automatically) be considered global?

Since top-level REPL evaluation occurs in a module scope, this should (if I’m understanding everything correctly) bring us back to pre-v0.7 behavior for for loops. While this would technically be a breaking change, I suspect it wouldn’t impact very much actual application or package code (again, assuming that for or let blocks are rarely found outside of a function body in packages or applications).

You could also apply this (or @jeff.bezanson’s and @StefanKarpinski’s) heuristic if for is “declared” as global for so we:

  1. don’t need to make breaking change!
  2. will have better possibility to code with “explicit is better than implicit” paradigm with for without global keyword.
  3. have quite simple method (i.e. “add global keyword”) to make teaching easier.

Alternatively we could do syntactic “hack” (*) similar to @goto and @label:

@auto_global for 

this give us possibility to have other heuristics in the future.

In REPL @autoglobal on and @autoglobal off could change heuristics globally (although here we have to consider danger of creating dialects).

(*) - question what language construct @goto and @label is I could probably ask in future in another topic. :wink:

What do you mean by that? Do you mean every variable declared in a global for for is automatically (and implicitly) declared global?

I meant that without global it is stay as it is. With global it apply heuristic. Which one is open question (although I slightly prefer @jballanc’s as simpler).

I’m not a fan of what I just proposed, it was just what I understood. I don’t think global for with implicit globals is a good idea because it will leak variable declarations into the containing scope.

Forgive me if I’m wrong, but I thought for and while are syntactic sugar for let and conditional @goto (plus some iterate calls in the case of for); and if, else, etc are syntactic sugar for conditional @goto.

For that reason, the rule needs to work post-lowering and be clear on this lower level, as well as imply a sensible high-level rule (at the very least, the rule should be invariant under “partial lowering”, i.e. replacing high-level constructs by their low-level targets). Since the rule should be invariant under rearrangement of basic blocks, the second example follows. Or did I get this wrong?

In principle, your rule is on the control flow graph, not AST (that’s why it is more complicated, and also why it should be accompanied with tools to view the CFG; current tools to view the AST are not helpful anymore to debug scoping issues). So it is not syntactic, but close enough not to matter outside of corner cases.

PS.

question what language construct @goto and @label is I could probably ask in future in another topic. :wink:

Goto does what’s written on the label :wink: Sorry for the pun. A label labels a position in your code (so it does nothing). Goto goes (jumps) to a position; you need to label the position in order to tell goto where to jump to. This is super useful for e.g. breaking out of nested loops, and imho far cleaner and more readable than pulling along a bool (typical found) and doing multiple found && break and a final if found ... else end. Also, it is the low-level construct that loops are syntactic sugar for.

Can you please clarify for which constructs and contexts this would apply?

Eg

  1. for, while, try-catch-finally, let, comprehensions, broadcast-fusing,
  2. only for the REPL? or all module/baremodule?

are there examples of for or let blocks evaluated in a module scope where we definitely would want the variables in those blocks to be local?

Yes, this happens frequently when you copy/paste method bodies to the REPL and wrap them in a let. For example, in a function I can use first as a variable without conflicting with the function named first; if first is local to the let, then there is also no conflict and my copy/paste debugging works. If in contrast first is global, I have to change the name of that variable before I can debug in this fashion.

(@StefanKarpinski, I’ve not yet had time to really sit down and think about this carefully; I will try to do so soon, though I suspect you’ll get all the feedback you need from others. No matter what, kudos for coming up with devious solutions!)

2 Likes

2 posts were split to a new topic: Side discussion on gotos

A post was merged into an existing topic: Side discussion on gotos

This was precisely the solution previously proposed here:

All scope-introducing constructs that occur outside of a function body.

  • only for the REPL? or all module / baremodule ?

Both. One of the goals is not to have different behaviors in different places. Since that inevitably leads to a series of questions about “why does this work in the REPL but not in my script/module.” In particular, since “scripts” (aka programs) operate in Main, which set of rules should apply to them? Are they interactive or non-interactive?

7 Likes