I’m coming late to the wider discussion around the scope changes, so please forgive me if I haven’t caught-up with all of the various viewpoints. Having tinkered a bit in the v0.3-0.4 era, the scope changes in v1.0 came as a little bit of a surprise.
So far I’ve seen no mention of parallelism in the discussion, and it seems to me that the concept of “local” —if it’s to be truly useful in the context of parallel execution— requires further refinement and qualification, depending on the form of parallelism and the locality required.
Such added complexity might only be required by heavy-duty applications, whereas the v1.0 approach of having a simple local scope as the default might ultimately prove unhelpful in many situations, while adding a syntactic burden for interactive exploration/prototyping that others have described.
Although global scope can carry severe penalties and pitfalls, explicit enforcement of locality (e.g., using directives/annotations, or simply functions) when necessary, and with specific purpose, seems to me to be a better approach.
That said, I think Stefan Karpinski’s solution is ingenious. It’s similar to the “autoscoping” behaviour that Sun (now Oracle) added to its OpenMP compilers, albeit that automatic behaviour has to be enabled explicitly.
The autoscoping rules for variables in an OpenMP parallel region can be found in section 5.3 of the Studio Compiler manual; the rules for scalars are particularly relevant here:
S1: If the use of the variable in the parallel region is free of data race conditions for the threads in the team executing the region, then the variable is scoped SHARED.
S2: If in each thread executing the parallel region, the variable is always written before being read by the same thread, then the variable is scoped PRIVATE. The variable is scoped as LASTPRIVATE if it can be scoped PRIVATE and is read before it is written after the parallel region, and the construct is either a PARALLEL DO or a PARALLEL SECTIONS.
S3: If the variable is used in a reduction operation that can be recognized by the compiler, then the variable is scoped REDUCTION with that particular operation type.
Perhaps some consideration of this and related approaches —keeping parallelism in mind— would help convergence to a solution.