`outer` keyword to automatically `local` a variable?

When using the outer keyword, as desribed in Scope of Variables · The Julia Language :

julia> function f()
           i = 0
           for outer i = 1:3
               # empty
           end
           return i
       end;

julia> f()
3

do we really need the i = 0 ?
If we autmatically add an invisible

local i

whenever we encounter outer i, will it be a bad idea?

I’ve read there is some negative sentiment to the whole outer keyword as hinted in: "outer" keyword, Julia for-loop variable scope - Stack Overflow

Note that if the “outer” i is not directly in the enclosing scope of for, adding an invisible local i will have different semantics, as it would create a new binding, e.g.:

julia> let i=0
           let
               for outer i=1:3 end
           end
           i
       end

3

julia> let i=0
           let
               local i
               for outer i=1:3 end
           end
           i
       end
0
3 Likes

So hypothetical logic for this feature request would have to be a bit more subtle: if the variable doesn’t already exist (as a local or do we also allow this to work with globals?) then you’d want to implicitly declare the variable in the immediately enclosing scope. Also, what about the case where the loop doesn’t execute? Then the existence of the untaken loop causes an undefined outer variable to be implicitly created. It seems clearer to require the user to explicitly declare the variable, then there can’t be any confusion about what scope the variable should appear in or what value it should have if the loop isn’t taken.

5 Likes

That’s right, variables must be defined at compile-time unequivocally.
However, I think an outer to implicitly define the local sounds good. Although, there is a confusion with an outer referring to a global, I think it is better to always define an implicit local. And to require an explicit global to access the global.
Even the keyword name outer seems to suggest the enclosing scope and the current default requires more effort in more cases (it isn’t the right “branch prediction”).
Additionally, we should be wary of the simple do-nothing bias, as this is a rarer example of a not-often-used construct which could still be improved with not much cost (or maybe I’m mistaken about this. To answer this question, a source-wide grep is needed).

rfourquet already demonstrated that an implicit local statement can shadow an already existing outer local variable, so best case is doing it only when there isn’t any in the outer local scopes.

When there are no outer locals to shadow, putting one right outside the for-loop makes sense. However, as StefanKarpinski said, variable isn’t guaranteed to be initialized because the for-loop isn’t guaranteed to run e.g. for outer i in iter where iter is empty. There’s no reasonable implicit default value to fix that. Relying on an iterable to be non-empty to initialize a variable is very likely a bug, and I prefer it to be caught by a syntax error at definition before an undefined error at some of the calls:

julia> function f(n)
         local i # implicit proposal
         for outer i = 1:n end
         return i
       end;

julia> f(1)
1

julia> f(-1)
ERROR: UndefVarError: i not defined
Stacktrace:
 [1] f(n::Int64)
   @ Main ./REPL[4]:4
 [2] top-level scope
   @ REPL[5]:1

julia> function f(n)
         for outer i = 1:n end
         return i
       end;
ERROR: syntax: no outer local variable declaration exists for "for outer"
Stacktrace:
 [1] top-level scope
   @ REPL[6]:1

So, the consequences of the implicit local i proposal is actually substantial.

4 Likes

Okay… I relent… the status quo isn’t so bad.
But still, that extra definition is quite annoying.

But there is something awkward in the way for loops communicate to the enclosing scope. Not knowing if the loop exited through break or through iterator end. Not knowing final value. Feels like these values are available to the code and to the mental model of programmer, yet not easily enough in the source code.

I invite anyone who shares this feeling to suggest an improved syntax (in this thread or somewhere else).

You’re not alone in this opinion. The structured program theorem that moved programming languages away from GOTOs and toward block patterns does not need multiple exit points like break and early return, and some stricter interpretations enforce single exit points e.g. Pascal. However, this would require some extraneous variables and code duplication in many programs, and Shapiro found that students tend to run into bugs in Pascal that don’t happen at all if they were allowed to return early. Kosaraju ended up proving that you can only avoid extraneous variables entirely by allowing multi-level breaks from a loop.

I mean, it’s impossible to know that at the source code in general because it’d depend on runtime values and calculations that we probably can’t do on our own.

1 Like

Of course, knowing the final value at loop exit at run-time. But currently it isn’t so accessible, unless you shadow it with a local or through the outer construct. A tad too much work, for something the programmer is quite sure the machine has access to (and similarly the point of loop exit - return or iterator-end - which can be somehow “labeled”). This use appears in so many algo pseudo code that it is a shame it’s not more syntactically simple.

I’m sorry, I don’t understand what you mean by a value not being accessible and needing to shadow with a local. Do you have a MWE?

lastval = 0
for i in 20:30
   lastval = i   # shadowing
   if isprime(i)
      break
   end
end

# here `lastval` is 0 if no iteration
# or last iteration before break
# (*) still can't figure if iterator-ended or broke in last iteration.

This kind of decision depending on loop ending condition is very common in algorithms. outer makes this a little simpler, but there is point (*) also.

I’m not suggesting any better syntax yet. But feel there should be one. And we should be couragous enough to not immediately dismiss improving this.

So you mean that the value of for i in isn’t accessible from outside the for-loop unless you assign the value to another local variable from the outside? That’s normal for local variables when its origin scope ends; in this case the i is local to each iteration, so lastval = i really is necessary as an outer variable that can persist across iterations. A manual lastval = 0 is necessary to properly initialize in case the for-loop runs 0 iterations. no changes to for outer i in can possibly get around that.

You can if you store some information in extraneous outer local variables, break just doesn’t do it for you because it’s often unnecessary work.

In Python this is done with for-else, where else means “finished without a break

for x in [1,2,3]
  if p(x):
    break
else:
  print("finished normally")

Worth mentioning that this is extremely rarely used, and when someone misquoted Guido van Rossum as wanting to rename the else to a more intuitive name like nobreak, he clarified that he wouldn’t include this feature at all if he could go back in time. The flexibility and clarity of per-break flags and if-statements after the loop just outweighs it.

How about:

rc = for i in 1:3
    if p(i)
        break :here
    elseif q(i)
        break :there
    end
end
rc == :noiter && println("empty iterator")
rc == :here && println("broke p(x)")
rc == :iterend && println("nonempty iterator ended")

rc mechanism will get optimized away if unused. Currently no return value of for to be backward compatible to.

Changing for-loops and breaks to return a value is unnecessary to accomplish that. Syntax shouldn’t be drastically modified in minor revisions even when it would be convenient.

Not possible to elide at parse-time or compile-time because the value of p(i) and q(i) across all iterations and thus whether the breaks occur is only known at runtime.

If rc is not used (i.e. no rc = before for), elidable.
This is not drastic, as this syntax is never used (for returns nothing).

The key to change in a language is 99.9% backward compatibility,
not complete freeze.
– famous old proverb

Then that wouldn’t need to be elided or optimized away, it just wouldn’t be there in the first place.

It’ll change the return values of functions whose last expression is a for-loop, so it’s not actually backwards compatible. Adding syntax isn’t always backwards compatible, it depends on implementation.

I would say this is around 99.9% backward compatible. But any upside should be measured also.
A sample of for loops across the code-base and a little evaluation might help.

But I know there is no appetite for changes…

ASIDE: Function ending with for loops can also replace it with foreach which returns nothing and doesn’t need this ‘mechanism’. Of course, this is not backward-compatibility per se.

This is infeasible to prove. Sampling for-loops in source code isn’t enough because the source code does not say how often each for-loop is used, and that’ll change with the package ecosystem. A useless package can generate enough methods ending in for-loops to dwarf every other for-loop in existence, it would be meaningless.

Hard disagree, Julia is changing fairly quickly. The key difference is those features are highly demanded and developed collaboratively, not done on unvetted whims.

Nope, even if implementing the same algorithm, foreach-do involves a closure and would introduce type inferrability issues for reassigned captured variables equivalent to outer local variables accessed by a for-loop.

1 Like

Didn’t mean uniform sampling. Sampling in used packages.

For sure.

Maybe people will like this feature. Julia is Turing-complete without this feature, I agree.